Closed coreylawrence closed 5 years ago
Here are a few thoughts on how to achieve this. First, what I mentioned in our meeting yesterday is a functionality of Git, and not GitHub. Corey, correct the second line of the text with the following: ... through Git's commit identifier
I assume we are using tags to mark the release version. If I use the command git describe --tags
on the master branch, I get for the current version:
0.0.3-160-g568fdd2
where 0.0.3-160
is probably the latest version number according to the latest tag, and 568fdd2
are the first alphanumeric characters of the commit number. I think this is the complete identifier that should be used, because it contains both the latest stable release version number, and the latest commit number.
If you want to see more detail about the latest commit, you can type the command git show --summary
, which right now outputs:
commit 568fdd2ba1931591ebcc006f4168b8d422fc8e29 (HEAD -> master, origin/master, origin/HEAD)
Merge: bfad9dd d184ce1
Author: Jeff B <jbeemmil@gmail.com>
Date: Thu Jan 24 10:28:58 2019 +0100
Merge pull request #157 from AuHau/master
Correcting metadata for the XLSX viewer (#156)
The difficult part on how to print this information on the website and on the files that people download. Corey mentions that we could use the build function for this, but the problem is that the build probably is run before you make the latest commit. If you run the function, update versions and commit, the print out from the build function would be one commit behind. So, I think this won't be a good solution.
We probably would need to use a different tool that adds this information after the commit. I honestly don't know how to do it, and it may be difficult to do. I would simply recommend that if a user wants to know what specific version he/she is using, the git commands describe
and show
would do the job.
Thanks Carlos, your response is extremely helpful. Base on your explanation, I agree that providing users with the necessary information to apply the git commands you mention in order to identify the version of the database they are using is the best way forward for advanced users that are accessing the Git repository directly.
In addition, it seems like it should be easy enough to include a function in the ISRaD-R package that returns the same information. That just leaves the users, who are downloading data directly from the web interface. For those folks, maybe there is a way to regularly update a the html files with the the output string from git describe --tags
call?
I agree with Carlos, I think automating this would be difficult, if not impossible. Someone will have to change the website each time, which is not that much work.
Alternative idea: I know that we decided on the commit version as the means of indicating the database version, but another easy solution is to just ask (require) users to record the date they accessed the data. This has been standard for a long time with web-based resources (ie. when citing a website you indicate the date accessed). Another example is GBIF.org which is constantly growing. People just state the date they downloaded the data in their methods section.
@crlsierra can you link you some examples where people have used this approach for versioning datasets? I seem to remember you mentioned that it is common for certain journals. It would be helpful to see how they executed this.
This post explains why tagging the datasets with their most recent commit cant really be done. https://stackoverflow.com/questions/14208272/know-git-hash-before-committing
I agree with Grey and the post he shares. It is impossible to tag the datasets with the current commit. This con only be done one commit behind.
Here's a guide on how to prepare a release and get the doi from zenodo. An example on how to link cite and link this information is this paper. Check the code availability section.
see https://international-soil-radiocarbon-database.github.io/ISRaD/database/ for information about versioning
In response to discussions regarding versioning, particularly as it pertains to our description of the dataset in the in-prep manuscript, we have agreed to use at least two counting systems. First, official "stable" releases will be indexed with whole-numbers and will correspond to splitting of a new stabile branch in the repository AND issuing of a new dataset DOI #. Second, interim dataset changes will be indexed using the GitHub repository commit identifier.
Here is how I have tentatively described this in the text of the manuscript:
To make this work as stated, we need to provide the version and commit identifier in several locations including (1) a print-out on the web interface page where files are downloaded, (2) within the files that are downloaded, (3) within the data object on the GitRepository.
My thought is that we mostly need to track the commit identifier associated with new builds of the database. So in other words, we would need to add code to the build function that reads the current commit identifier and stamps the appropriate files.
This is a little beyond my capacity so hopefully Grey or someone else can spend some time working on this in the near future. Any additional suggestions are also welcomed.