codecheckers / discussion

General discussions and questions
0 stars 0 forks source link

Clarifying Zenodo upload / codecheck directory / codecheck repo #20

Open mstimberg opened 3 years ago

mstimberg commented 3 years ago

The CODECHECK bundle is described as:

The CODECHECK bundle includes all files that the codechecker used to conduct the CODECHECK. This may include a copy of the author’s files, and any additional files that the codechecker created to assisst them in their codecheck.

I understand that this is not formally specified intentionally, but from the above description I'd expect that the CODECHECK bundle actually refers to the full repository clonded unter the codecheckrs org, since the code and figures provided by the authors are certainly part of the "files that the codechecker used to conduct the CODECHECK". On the other hand, the community guide rather seems to imply that the CODECHECK bundle only refers to things in the codecheck directory. Some Zenodo uploads only contain the codecheck.pdf file, some additionally have some contents of the codecheck directory, but I did not see any that actually contained the output directory let alone all the code/figures provided by the original authors.

I feel that it would make sense to have more consistency in this. Also, choosing some files by hand also in principle makes it possible to upload different versions that are inconsistent with the ones in the repository. In my opinion, it would be best to either only upload the codecheck.pdf file to Zenodo (similar to the ReScience C approach), or use the github Zenodo connection and make a release of the full repository that then gets automatically uploaded to Zenodo (not sure whether this works with a reserved DOI, though).

nuest commented 3 years ago

@mstimberg Thanks - yes, consistency is key here.

The reservation of a DOI does not work with the GitHub-Zenodo-integration, unfortunately. So we need the manual record-creation-and-DOI-reservation step. What we could say is that we always zip the whole GitHub project and upload it to Zenodo (which is also partly automated in the R template).

I think what is important for me is that we archive the extra files that the codechecker created, but these likely go into the codecheck directory anyway, and would be included if the archive the whole repository.

I don't really see the point about choosing files by hand and inconsistency, as the fork in the CODECHECK organisation should not be edited anymore after the check is completed.

Maybe we should add a step to "archive" the repository after the check is published so that it remains unchanged?

@sje30 Any thoughts re. what the "bundle" should include?

mstimberg commented 3 years ago

I don't really see the point about choosing files by hand and inconsistency, as the fork in the CODECHECK organisation should not be edited anymore after the check is completed.

Sorry if that wasn't clear. What I meant was that the upload operation on Zenodo is manual (but I wasn't aware that the R template can do this via the API) – so you might possibly zip and upload files on your local machines that are not actually identical to the ones in the repository. Maybe making a release on github and then uploading the zipped file from the release page would make this less of an issue?

nuest commented 3 years ago

Ah, I see. The release on GitHub is a great idea to "snapshot" and keep the GH and Zenodo copies in sync. I also noted that we should "archive" the GitHub repository at the end of the workflow, if only to keep ourselves from making any changes without good reason.