Closed dellagustin-sap closed 3 years ago
see work happening at https://github.com/SAP/project-portal-for-innersource/pull/18
More motivation to finish continue work on that PR :)
@JustinGOSSES @zkoppert in the meantime we got around to create dedicated documentation about the Crawling process. I am sure it would greatly benefit from your review, as you have looked into the crawling topic already. See: https://github.com/SAP/project-portal-for-innersource/blob/main/docs/CRAWLING.md
Cheers :)
fixed with #18
Read through it and seemed very understandable.
Only think of a few small additions that might be useful ....and are more future additions than blockers (1) a reference way to combine results from multiple crawlers running against different code platforms (2) what happens if a code repository exists on multiple code platforms scanned (3) a GitLab to GitHub mapping of the key:value pairs that come out of crawlers. (4) add link to a suitable GitLab crawler once one exists.
Most of these points are for a large distributed organization that might have many internal GitHubs and GitLabs.
Thanks for the review @JustinGOSSES.
Can you say more about this point
(3) a GitLab to GitHub mapping of the key:value pairs that come out of crawlers.
What would this be used for? Is it related to writing a dedicated Gitlab crawler (point 4)?
Yes. Point 3 should really have two parts I guess, (a) a mapping used to convert the keys from GitHub names to GitHub names and then (b) an actual script to do the conversion.
This would then feed into point 1, a reference way to combine results from multiple crawlers running against different code platform instances.
We query GitHub and Git/Gerrit instances using the same crawler script in our environment. The fields from Gerrit need to be partially mapped or recalculated as some stats and concepts do not exist there but it's pretty straightforward. I would guess for GitLab oder other stacks it is probably similar. We could add documentation for such mappings in the crawler documentation as we get more implementations with other stacks.
I found this crawler implementation: https://github.com/zkoppert/innersource-crawler It would be nice to mention it and maybe add a comment on the contributing guidelines for people to send PRs with a link if they implement a crawler.