cernopendata / opendata.cern.ch

Source code for the CERN Open Data portal
http://opendata.cern.ch/
GNU General Public License v2.0
666 stars 148 forks source link

Depositing content to COD #2423

Open ArtemisLav opened 6 years ago

ArtemisLav commented 6 years ago

It would be very useful if it was possible for the users to provide metadata about new resources they want to publish in a more automated way. Currently this is being done through processing GitHub issues and discussing with the content providers via comments. This method works, but it takes much more time than it needs to. This could be accomplished in a few ways:

Notes: This would be similar to the Zenodo upload form. It would require a lot of work.

Notes: This would be easier to implement, since the CAP forms are already in place. A concern about this method is that some of the content we have on COD wouldn't pass through CAP (e.g. collections such as Documentation), so it would still be necessary to process said content in a different way.

katilp commented 6 years ago

Agree for the CAP approach 👍

True what @ArtemisLav says about Documentation records, the CAP approach is mainly valid for Software records.

But it can be used also if needed for cases where there's a Guide and connected Software examples (thinking of the existing Trigger guide and examples by @caredg , and those that are coming, i.e. MC generation and luminosity calculation). This would just need a place for an extended description in CAP, taken from a README of github), probably a useful thing to have in CAP in any case.

katilp commented 6 years ago

@caredg @laramaktub something to think about now when doing restructuring of the code repos on the CMS side? I agree with @caredg that we should not overdo, but it would be really nice to see if a piece of code ready to be released would be easy to input to CAP and if a CODP record could be automatically created from that

caredg commented 5 years ago

I think the idea of using CAP as THE ingestion layer for COD records could work , provided that CAP will be rethought not only as an Analysis Preservation repository but also as a repository for the preservation of "Tools Examples" (and maybe for Documentation as well, see below). In the Create drop-down menu, for instance, one could add "CMS Tools", with similar, compatible forms as for the Analyses. The difference being that a Tool is not really a full analysis but a method that may be used for many analysis.

If this crystallizes, we could then add the step of "submitting to CAP" for the development of any legacy analysis or tool to the procedure of "how to contribute" (which we are finalizing).

In any case, whether all this goes through CAP or not, I think it would be good to instruct the user (contributor to Legacy/Open Data) to add some sort of file that stores the metadata in its github repo. Since we are planning, more or less, to get inspiration from the guidelines here, then one could think of requiring something like the paper.md file described here for each CMS legacy/open-data repository in github; something that . In fact, if the addition of CAP layer is rejected for any reason, this would be a good (maybe simpler) proposal which we could follow.

As far as pure Documentation records, I am less sure. Its COD records are maybe too simple to even try to create a legacy/open-data Github repo (after all they do not have any code) and/or create a CAP entry. Undoubtedly, to create a COD record for documentation, the contributor will have to create the .md file by hand, and since the .json file does not seem to have anything produced by a script, it is less clear to me whether is worth going through the process of submitting it to a CMS legacy/open-data github repository and CAP instead of directly typing it to the COD Github repo. However, to maintain the symmetry, the duplication might be worth it.

In summary, I see advantages and disadvantages of using CAP to automatize COP records creation:

ADVANTAGES:

DISADVANTAGES:

katilp commented 5 years ago

Thanks @caredg : I would indeed be in favour for trying this for legacy/open-data Analyses and Tools (i.e. anything that can go as a software record on COP) , but for pure documentation, the source is already native on opendata github area in md, I would not complicate that further. And I agree that it would be good to clearly separate the legacy tools/examples records for normal analysis preservation cases (currently the distinction comes from the fact that the former do not have a cadi entry).