ecosyste-ms / summary

An open API service for producing an overview of a list of open source projects.
https://summary.ecosyste.ms/
GNU Affero General Public License v3.0
0 stars 2 forks source link

Parse/expose publiccode.yml and codemeta.json data #306

Open bzg opened 7 months ago

bzg commented 7 months ago

codemeta.json and publiccode.yml are used to describe software projects.

Some public sector projects are using publiccode.yml, such as demarches-simplifiees. codemeta.json seems more used for research projects.

Some of the data in these files are declarative, parsing/exposing them would be a real plus when it comes to knowing the projects behind the repositories, e.g. in terms of maintainance.

Is it something that would be relevant for ecosyste.ms?

Cc'ing @hjonin as we've discussing this in the past.

andrew commented 2 months ago

Yeah I think it makes sense for ecosyste.ms, there's two parts, detecting the files, which has already been added to the repos service: https://github.com/ecosyste-ms/repos/blob/main/app/models/repository.rb#L335-L336

And then the loading and parsing of those files, which could be quite easily added to the summary service?

bzg commented 2 months ago

Yes, adding those information to the summary service would be useful indeed. @simkim WDYT?

(Also, perhaps metadata collected from publiccode.yml or codemeta.json could help refine the score?)

simkim commented 2 months ago

Yes I could add a bloc of information, for the score no idea how to influence it. I think it's more a work for @andrew who know what is the intend.

andrew commented 2 months ago

I've added support for fetching and parsing both publiccode.yml and codemeta.json to the summary app if they are detected in the repository.

The parsed data is displayed both in the html and in the api (note to self, update the openapi docs)

Examples:

codemeta:

publiccode:

bzg commented 2 months ago

Fantastic! Thank you very much for implementing this.

My understanding is that this is implemented in the summary service. Is It?

I believe it would make sense to implement this in the repos service, as those metadata are really linked to the repo data. WDYT?

andrew commented 2 months ago

@bzg I've put it in the summary service as my deployment of the repos service can't handle storing the contents of files in the database at it's current scale, it would fall over!

The repos service currently records the presense of those files and can produce a download url for each one with existing code, but we'd need it to be an optional feature that you would turn on for you instance.

bzg commented 2 months ago

I've put it in the summary service as my deployment of the repos service can't handle storing the contents of files in the database at it's current scale, it would fall over!

Fair enough. Another idea: make the parsing of publiccode.yml and codemeta.json optionnal in the repos service, turn this option off for your instance (after all, this wasn't a need of your initially) and let other instances turn it on if they don't have the same database constraints.

WDYT?

The repos service currently records the presense of those files and can produce a download url [...]

Thanks. Since repos exposes the default branch, it's straightforward to compute the download url for raw files in the repo, so the added value is not much IMO.

simkim commented 2 months ago

This could also be a related model storing fields for repository having the files. As it's not used a lot It will not generate a lot of additionnal data

bzg commented 3 weeks ago

My understanding is that we can now close this issue.

@simkim do you confirm publiccode.yml and codemeta.json data will be available in https://summary.data.code.gouv.fr when the configuration & deployment is done?