data-lessons / librarycarpentry

Materials for Library Carpentry development
22 stars 8 forks source link

Figure out a workflow for getting DOIs for regular LibCarp releases {META} #5

Closed drjwbaker closed 6 years ago

drjwbaker commented 8 years ago

moved here because it makes more sense!

I like the idea of creating (say) an annual release of Library Carpentry, with a DOI, et cetera. And I've used the Zenodo-Github link to do this with projects in the past. Looking at my Zenodo account, I making a release for each lesson so this is possible, though I'd rather do a big bundle per Programming Historian https://zenodo.org/record/30935#.V8VN9I78_6h. Does anyone have any thoughts on this? What I guess I'm proposing is that we do a "version x" release of all the Github repos combined (in some way..) on a semi-regular basis. This will need to be managed to ensure: 1) it is done at the regular intervals 2) everyone involved in credited correctly 3) the metadata is put somewhere public for reference.

cc @jt14den

jt14den commented 8 years ago

This sounds reasonable and I'd be happy to help. I've only used Zenodo on a per repository basis via the GitHub integration. How did you bundle the PH site? We might want to use semantic versioning to give flexibility. Maybe a semiannual release would work?

tracykteal commented 8 years ago

This is a good idea. Software Carpentry has done DOIs for each lesson, and I think some of the motivation there was to be able to give authors credit for the lessons they've contributed to, and potentially more than one reference if they've contributed to multiple lessons. It is also is easier to get a DOI for an individual repository. https://guides.github.com/activities/citable-code/

If you wanted to bundle them though, you could probably have a aggregating github repo, like we do for the ecology lessons https://github.com/datacarpentry/ecology-workshop/ (although that particular formatting still needs some work).

drjwbaker commented 8 years ago

Thanks both. I'd rather a bundle as it is easier to manage. Credit for contributions - for now at least - feel for me best handled through other mechanisms (indeed, we are working on this at the moment! https://github.com/data-lessons/librarycarpentry/issues/8 )

My idea was to just collect the relevant repos - git pull et cetera - and zip them up, then add metadata on Zenodo. We lose some of the project that way (all the issues), but I'm not sure the Zenodo-GitHub repo versioning captures that anyway (?)

jt14den commented 7 years ago

@drjwbaker and @weaverbel: @gvwilson popped into our Zoom on the second day afternoon PDT and we had a short chat about how great the sprint was going. We also talked about the value of making releases and getting DOIs after a sprint to capture contributions and give credit. I noticed that there's an active repo in SWC for semi-automating releases to Zenodo https://github.com/swcarpentry/swc-releases. How about we fork that work over and see if we can get it working for us? I'd be happy to help get this set up.

drjwbaker commented 7 years ago

I'm all for giving credit where credit is due, all for versioning with DOIs, and love the Github/Zenodo functionality. Let's do it!

jt14den commented 7 years ago

@drjwbaker I'm working on this -- want to include Library Carpentry in the swc-releases workflow. I pinged the repo to see how we can incorporate LC into that workflow.

drjwbaker commented 7 years ago

@jt14den Thanks Tim. Do report back as and when.

jt14den commented 7 years ago

@drjwbaker and @weaverbel: I've altered swc-releases for library carpentry and tested the workflow in my own fork and Zenodo account. The scripts work and do things in two stages:

  1. Creates a deposit per chosen lesson set, with authors, metadata from the repo, acquires a DOI and uploads a zip in Zenodo (these are currently in a draft state in my account, but all elements are there -- I will delete these in favor of recreating these in a librarycarpentry account in Zenodo). Releases are dated: so 2017.06 will be our first release.
  2. Creates branches, builds lessons and makes submodules of each lesson based on the deposited version - You can see what they look like here: http://www.tim-dennis.com/swc-releases/2017.06/ (notice the version and DOIs top-right at the lesson level -- don't worry they aren't registered, just pre-reserved). This approach lets us reference past lessons like SWC: https://software-carpentry.org/lessons/previous/

Next Steps (following the pattern established by SWC)

  1. I suggest we fork the swc-releases into https://github.com/librarycarpentry -- I'll need to get added as a member on that organization to do that.
  2. We need to create an account in Zenodo with a username of: librarycarpentry. I will need an application API key created for the script to function. I'm happy to set this up and share around the logins. We also can link the account to a Library Carpentry community (I'm currently sitting on that namespace), but can recreate under librarycarpentry.
  3. I need to know what lessons are ready for publishing! I tested on library-data-intro, library-openrefine, library-git, and library-shell.
  4. We'll need to rework the AUTHORS file to reflect the contributors to the lesson and what the script expects (firstname lastname). For my testing, I've done this in my account https://github.com/jt14den?tab=repositories - You can see what it looks like here: https://github.com/jt14den/library-data-intro/blob/gh-pages/AUTHORS (got this information from git shortlog -ns -- notice it also includes authors that worked on the template -- this is the same way it works on SWC). I can submit PRs for this.
  5. Once I run the script, it is suggested we actually publish them in Zenodo manually (all lessons will be in draft mode). This will give us the opportunity to make sure they look right, etc. We can also work iteratively -- correcting elements sourced from the script.
  6. In some period (6 mos.) we repeat! One good aspect on using the script is that @twitwi has been working on it and we can benefit from his work (I wonder how Zenodo supporting DOI versioning will change any of the publish assumptions).

What do you think? The published version will look something like: https://zenodo.org/record/278222#.WUhlJBMrJTY

gvwilson commented 7 years ago

:+1: - this is great.

drjwbaker commented 7 years ago

Marvelous work @jt14den. Thank you so much. I have added you as an owner. A few things:

  1. I think you mean https://github.com/data-lessons as that is where the lessons are (though I've added you to https://github.com/librarycarpentry so you can do website work if you want to :) )
  2. Okay.
  3. Core and Beta? http://librarycarpentry.github.io/
  4. I will ask all the lesson maintainers to check the automated output (if people didn't make edits it won't get captured by this, right?). This could be a bottleneck.
  5. Okay.
  6. Okay.
twitwi commented 7 years ago

About filling the authors, there is a bash script in the swc-releases repository (authors.sh) that tries to help managing AUTHORS across many repository. It enriches/relies on an obfuscated global mail-map file https://github.com/swcarpentry/swc-releases/blob/gh-pages/all-mailmap

The script is less documented that the rest of the process but it can prove helpful.

Also, I recently made it remove people that contributed only to the style repository https://github.com/swcarpentry/swc-releases/commit/6c7c1a7d62387d81bab59b3b43c8333b75a9e7b1 (this new feature has not been used for a release yet, I used it to experiment with generating a shorter bibtex).

drjwbaker commented 7 years ago

Thanks for your input @twitwi!

jt14den commented 7 years ago

@drjwbaker Thanks! I'm cycling back to this now. A couple of comments related to your numbered responses above:

  1. If we are keeping the website in https://github.com/librarycarpentry, I suggest we publish the lessons there. The releases won't need to be in the same GH organization as the lessons. The submodules will be only references to the branches that are the published version of the lessons in the data-lesson repos.
  2. cool
  3. good by me
  4. I'll check out @twitwi's script. I missed it totally. I'll also check out the version that removes the authors who only contributed to the styles repo. I can make PRs back to the repos if all works as anticipated -- unless maintainers have already updated their AUTHORS file.

Thanks @twitwi for the scripts!

I should be working on this this weekend! Hope to have done shortly.

drjwbaker commented 7 years ago

On 1., to be totally clear: doing this won't require us to move the lessons from https://github.com/data-lessons, correct? That would be a BIG job..

jt14den commented 7 years ago

@drjwbaker no need to move lessons. git submodules are a way to have external repositories show up as a sub-directory of another git repo. it's like a symbolic link in unix.

Almost there with making things happen. Reconciling authors was a bit of a bear, but should be easier next release. The next step is to run the script to make branches in lesson repos and submodules. Maintainers will then need to review authors on that branch, give corrections (I'll ping them on the issues you created). Since, the releases are dated/named based on yyyy-mm (2017-06) and looking at the calendar, I'm wondering if we should change the release to July (2017-07) and not June?

drjwbaker commented 7 years ago

Okay. Thanks for your hard work @jt14den. Yes, July release sounds better.

drjwbaker commented 7 years ago

Cycling back to this. I know you've been busy @jt14den. Is this in your plan for the Autumn?

tracykteal commented 7 years ago

SWC and DC do releases, and @twitwi has a system that should work for LC lessons too. Blog post on the last SWC release https://software-carpentry.org/blog/2017/08/release-2017.08.html

Also, before the first release of new lessons Data Carpentry has been doing an Issue Bonanza and Bug BBQ http://www.datacarpentry.org/blog/lesson-release/ so that could be something to consider, although I know LC has already had some lesson hackathons.

A LC lesson release is something we could discuss on the October calls.

jt14den commented 7 years ago

​Sorry guys, work intervened on finishing this. I did use @twitwi​'s scripts and workflow to successfully create the zenodo entries and website on test. The issue was me dropping the ball on coordinating with maintainers to make sure authors were set up right. When I come back from SA I can work on this.

On Fri, Sep 29, 2017 at 3:17 PM, Tracy Teal notifications@github.com wrote:

SWC and DC do releases, and @twitwi https://github.com/twitwi has a system that should work for LC lessons too. Blog post on the last SWC release https://software-carpentry.org/blog/2017/08/release-2017.08.html

Also, before the first release of new lessons Data Carpentry has been doing an Issue Bonanza and Bug BBQ http://www.datacarpentry.org/ blog/lesson-release/ so that could be something to consider, although I know LC has already had some lesson hackathons.

A LC lesson release is something we could discuss on the October calls.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/data-lessons/librarycarpentry/issues/5#issuecomment-333123486, or mute the thread https://github.com/notifications/unsubscribe-auth/AADN9ToOLXZoEgy37F5hd0xp0qooEx4Rks5snO3dgaJpZM4Jy4gW .

-- Tim Dennis Director, Social Sciences Data Archive UCLA Library https://orcid.org/0000-0001-6632-3812 Schedule a meeting: https://calendly.com/timdennis

drjwbaker commented 7 years ago

No worries these things take time! Thanks for taking the lead @jt14den.

twitwi commented 7 years ago

If at some point you need help with how the authors script works, let me know.