SAP / project-portal-for-innersource

Lists all InnerSource projects of a company in an interactive and easy to use way. Can be used as a template for implementing the "InnerSource portal" pattern by the InnerSource Commons community.
https://sap.github.io/project-portal-for-innersource/
Apache License 2.0
143 stars 71 forks source link

Update index.js #20

Closed JustinGOSSES closed 3 years ago

JustinGOSSES commented 3 years ago

I used the crawler described here: https://github.com/SAP/project-portal-for-innersource/pull/18 and it created a repos.json file with the topics key:value pair inside of the _InnerSourceMetadata key. Hence to get the portal to work such that all tags were shown, I had to change line 316 from "topics":oRepo.topics, to "topics": oRepo._InnerSourceMetadata.topics,. Once I did that, topics appeared in the portal.

I suspect there may be a change in the crawler not reflected in the portal demo?

CLAassistant commented 3 years ago

CLA assistant check
All committers have signed the CLA.

Michadelic commented 3 years ago

Hello @JustinGOSSES and thank you very much for your PR. Indeed, we have added the topics right below the root JSON structure, but in the documentation we outlined the topics should be added below _InnerSourceMetadata. As GitHub API does not return the topics by default with the repos API we have to make a separate call and store the results by ourself (similar to the participation stats). To make it clear that this data is fetched separately we will add it below our custom key.

I will accept this PR and adjust the mock data so that the search will be working consistently on the additional data provided by _InnerSourceMetadata.

Michadelic commented 3 years ago

PS: URL for testing (topic different to repo name or description) i used is https://sap.github.io/project-portal-for-innersource/#!&search=describe

Expected result: Should return Sol/Mars

spier commented 3 years ago

@Michadelic I am adding to this threads but if you want to break this out into a new issue rather, please do.

When implementing the topic search in the portal (in https://github.com/SAP/project-portal-for-innersource/pull/12), I didn't realize that the documentation asks for them to be stored under _InnerSourceMetadata, rather than in the root of the JSON structure. My bad and great catch @JustinGOSSES! 😄

Also I have seen that @zkoppert's crawler does indeed put the topics under _InnerSourceMetadata.topcis, just as specified in the docs.

How I broke it :)

I discovered that I messed this up because I had put the topics at a different place in the JSON structure in my implementation of the crawler, which I have also made public in the meantime (https://github.com/spier/innersource-crawler-ruby)

I did that because I found out that the GitHub API does allow to fetch the topics together with the search results. This is available via a preview feature and one has to set a custom Media Type application/vnd.github.mercy-preview+json, as shown here: https://github.com/spier/innersource-crawler-ruby/blob/main/crawler.rb#L31

Using that preview feature would allow us to reduce the amount of API calls that the crawlers have to make. However it might be subject to change, depending on what GitHub decides to do with this API preview feature eventually. @zkoppert do you have any info from GitHub about this by any chance? Always good to have an insider around!

Where to go from here

I am assuming that the data in the _InnerSourceMetadata object should have either of these characteristics:

If we follow those semantics, we could make the argument to put the topics into the root of the JSON structure instead (see potential risks with the GitHub API above).

No matter what we do here, I would recommend that

  1. the crawlers implement what the portal specifications says
  2. both crawlers do roughly the same thing :) (slightly different way of saying the same thing as (1) I guess 🤣 )

Looking forward to hear your thoughts on this.

And btw this conversation here is certainly motivation to finish the documentation PR that I started in https://github.com/SAP/project-portal-for-innersource/pull/18, so that others will hopefully struggle less with these things.

JustinGOSSES commented 3 years ago

Thanks for sharing all this context, I definitely understand a big more now!

Michadelic commented 3 years ago

Great insights @spier, did not know this preview feature. In general, i tried to keep the root structure plain GitHub response and put everything manually or semi-automatically added below _InnerSourceMetadata, e.g. also the participation stats that are loaded with a separate GitHub api call. Generally, it might make sense to have a separate property where additional content from GitHub or other systems loaded by the crawler is stored.

Then we could separate manually specified metadata from automatically loaded content more easily. On the other hand, putting it below _InnerSourceMetadata one can easily override any key or data in his local innersource.json. What do you think would be the best way going forward? We can then update the portal specification accordingly.

PS: I also have the documentation topic on my list - maybe we can do it together. Let's chat in slack and get going :-)