Depositing content to COD

ArtemisLav commented 6 years ago

It would be very useful if it was possible for the users to provide metadata about new resources they want to publish in a more automated way. Currently this is being done through processing GitHub issues and discussing with the content providers via comments. This method works, but it takes much more time than it needs to. This could be accomplished in a few ways:

Developing deposit forms for COD where the users will fill out a form, which can then be processed to create a new record in the portal.

Notes: This would be similar to the Zenodo upload form. It would require a lot of work.

Using CAP to get metadata about resources that have passed through there. Since CAP already has fully-developed forms, a lot of the metadata needed for COD records are already submitted there. Then, if a user wishes to publish to COD, they could be presented with an additional page where they would fill in the rest of the metadata we display in COD that are not normally captured in CAP.

Notes: This would be easier to implement, since the CAP forms are already in place. A concern about this method is that some of the content we have on COD wouldn't pass through CAP (e.g. collections such as Documentation), so it would still be necessary to process said content in a different way.

katilp commented 6 years ago

Agree for the CAP approach 👍

True what @ArtemisLav says about Documentation records, the CAP approach is mainly valid for Software records.

But it can be used also if needed for cases where there's a Guide and connected Software examples (thinking of the existing Trigger guide and examples by @caredg , and those that are coming, i.e. MC generation and luminosity calculation). This would just need a place for an extended description in CAP, taken from a README of github), probably a useful thing to have in CAP in any case.

katilp commented 6 years ago

@caredg @laramaktub something to think about now when doing restructuring of the code repos on the CMS side? I agree with @caredg that we should not overdo, but it would be really nice to see if a piece of code ready to be released would be easy to input to CAP and if a CODP record could be automatically created from that

caredg commented 5 years ago

I think the idea of using CAP as THE ingestion layer for COD records could work , provided that CAP will be rethought not only as an Analysis Preservation repository but also as a repository for the preservation of "Tools Examples" (and maybe for Documentation as well, see below). In the Create drop-down menu, for instance, one could add "CMS Tools", with similar, compatible forms as for the Analyses. The difference being that a Tool is not really a full analysis but a method that may be used for many analysis.

If this crystallizes, we could then add the step of "submitting to CAP" for the development of any legacy analysis or tool to the procedure of "how to contribute" (which we are finalizing).

In any case, whether all this goes through CAP or not, I think it would be good to instruct the user (contributor to Legacy/Open Data) to add some sort of file that stores the metadata in its github repo. Since we are planning, more or less, to get inspiration from the guidelines here, then one could think of requiring something like the paper.md file described here for each CMS legacy/open-data repository in github; something that . In fact, if the addition of CAP layer is rejected for any reason, this would be a good (maybe simpler) proposal which we could follow.

As far as pure Documentation records, I am less sure. Its COD records are maybe too simple to even try to create a legacy/open-data Github repo (after all they do not have any code) and/or create a CAP entry. Undoubtedly, to create a COD record for documentation, the contributor will have to create the .md file by hand, and since the .json file does not seem to have anything produced by a script, it is less clear to me whether is worth going through the process of submitting it to a CMS legacy/open-data github repository and CAP instead of directly typing it to the COD Github repo. However, to maintain the symmetry, the duplication might be worth it.

In summary, I see advantages and disadvantages of using CAP to automatize COP records creation:

ADVANTAGES:

Putting everything (CMS legacy/open-data Analyses, Tools and Documentation) in CAP would give extra redundancy to Open Data efforts and the only redundancy for Legacy-only efforts (though the latter won't go to COP).
It would make the procedure of COP record ingestion consistent and hopefully automatic.
Making changes to the instructions/documentation of a certain analysis/tool or doc in CAP through its forms is probably easier for the user than making a bunch of "pull-requests" to change a COP record. Then, if a COP records bot can be run to get automatic updates, this would allow to keep everything up-to-date.

DISADVANTAGES:

Inserting a layer of interaction between Git code repositories and COP may result in extra complications later on.
If anything needs to be changed in a contribution, the user most likely will have to make updates in two places (Github and CAP) instead of just one.
Maybe it gets unnecessarily complicated. Maybe just reading from a standardized metadata file, which is prepared by the user in her repository directly, is much simpler and less prone to breakups.

katilp commented 5 years ago

Thanks @caredg : I would indeed be in favour for trying this for legacy/open-data Analyses and Tools (i.e. anything that can go as a software record on COP) , but for pure documentation, the source is already native on opendata github area in md, I would not complicate that further. And I agree that it would be good to clearly separate the legacy tools/examples records for normal analysis preservation cases (currently the distinction comes from the fact that the former do not have a cadi entry).

cernopendata / opendata.cern.ch

Depositing content to COD #2423