OpenSourceMalaria / OSM_To_Do_List

Action Items in the Open Source Malaria Consortium
82 stars 13 forks source link

Using Github to Work on Papers, Files #536

Open mattodd opened 7 years ago

mattodd commented 7 years ago

We’re using Github to discuss things, share news, ask questions.

Github can do a lot more than this. We’ve just started using the wiki for Series 4 (#535). We can use Github to handle collaborative working on documents/papers (#532 #507 ).

If you don’t want to install any software in order to create documents or modify existing documents, you should be able to download files and upload new versions directly.

However, if you’re working on lots of files, you’ll need the Desktop client. You’ll also need the Desktop Client if you want to make use of Github’s very nice versioning functionality - i.e. working on a file where all the previous versions are saved, without the need to make up new filenames the whole time.

We tested this out today. I’ve made a how-to guide here, in a new Tech Ops repository that we can use to house guides on how to do common tasks in OSM.

I’d be delighted to receive any comments on how to make this clearer. Using Pull Requests etc might seem like a faff at first, but the very nice thing about working on documents in this way, besides the versioning, is that each modification has to be checked before it is integrated into the master version of something. This is a very good way to ensure quality control in the checking of data, for example. We plan to use it for the assembly of the Series 4 Paper 1 experimental section, about which more soon from @david1597 .

A snag: if you’re receiving this you are probably a core contributor to OSM, and receive email notifications of things like Issues and Comments. It’s likely that your default notification settings also include alerts to processes that are common in working on files (Pull Requests, Merges etc). To avoid overload, you might want to pare back your alerts a little.

Any issues, please comment below.

mattodd commented 7 years ago

Would value comments by @greglandrum and @miike on the guide if not too busy.

greglandrum commented 7 years ago

@mattodd : I think it's pretty clear and looks good, but it is written with (I believe) the implicit assumption that whoever is following the instructions will have commit access to the main repo. As long as you're ok with allowing this in the project everything is fine.

Otherwise the complication of having each user work with their own github fork needs to be added.

mattodd commented 7 years ago

Correct @greglandrum . I'd not considered forks. I need to understand the distinction between a branch and a fork. In simple terms, is the latter a major version of the former that may never be merged back into the Master?

greglandrum commented 7 years ago

My understanding, possibly wrong or oversimplified: A pull request (PR) into a branch in a github repo (like the master branch on the OSMSeries4Paper1 repo) needs to come from a branch in a github repo. The PR can come from either the same repo (from a different branch), or it can come from a completely different repo.

A standard way of doing this that requires no administrative overhead is: 1) Someone who wants to contribute makes a copy of the central repo within github (this is called "forking" in github's interface) 2) They make the changes they want to make in a branch on their fork (maybe the master branch, maybe something els) 3) They create a PR from their fork onto the master branch of the central repo.

There are "subtleties" with keeping forks up to date and having people work locally that I unfortunately don't have time to cover right now (I'm at a multi-day workshop).

miike commented 7 years ago

To add to what @greglandrum has mentioned above.

Forks (an exact copy of a repository at a certain point in time) are great for users who don't have write access to the main repository (e.g., they can't merge changes into master). If everyone contributing already has write access to the main repository you probably don't need to worry about forks.

Bitbucket has a quite detailed explanation of this here (the fork, branch, pull request model). These are all concepts related to git so you can substitute any references to 'Bitbucket' with 'Github'.

mcoster commented 7 years ago

@mattodd - will all contributors have write access? That seems like the biggest issue to me, regarding the instructions in your how-to guide (which is excellent, btw!) If a contributor doesn't have write access, github will force them to make a fork.

image

When they want to commit to the master branch of the central repo, they create a pull request, but that will have to be merged by someone who does have write access.

Caveat - still learning basics of git, so I might have some things wrong...

mattodd commented 7 years ago

Hi @mcoster - my bad, you were for some reason not a member of the "core contributing" team in OSM. Everyone therein has write access. Fixed, so you do too, now. Let me know of any other issues like this though. We might be getting to the point where we need more admins and owners of OSM. But with great power comes...

mcoster commented 7 years ago

Thanks @mattodd I wondered why I wasn't able to add labels or assign myself issues!

mcoster commented 7 years ago

Just been poking around in the Series4Paper1 area - regarding your earlier comments @mattodd about checking modifications before they are merged into the master.. Since many of these files are eg .docx, how will this process work, since there won't be git diff's on binaries?

I've seen there are workarounds using pandoc to create .md files to push alongside the .docx, but this approach seems a bit technical. There's also Simul - a swish looking approach, but limitations on free use. I don't suppose we would all want to learn LaTeX? This maths textbook written by >60 contributors in impressive!

Markdown for everything would be great - pity journals don't accept .md submissions.

mattodd commented 7 years ago

Right @mcoster yes, there's a fine line between a perfectly efficient work approach and how much we can expect of contributors. So. The paper itself will be assembled in the Google Doc (everyone's happy with this platform), with an imperfect system of approvals (someone can write and then accept changes without anyone being alerted, I think). For everything else - all the pictures and SI - these can be uploaded, modified, collated on Github, treating it like a fancy Dropbox. The experimental will be a sequence of little docx files. If someone submits these with pull requests, someone can "accept" them after checking, in which case that version is the currently-accepted version. At the end of this we'll need to zip all the .docx's back into a single file. Lo-tech. Am I answering your question?

Most of the effort behind the first OSM paper was in the correct assembly of the SI - the linking of statements to raw data and experiments. I'm expecting the same in this (much larger) paper.