OpenSourceMalaria / OSM_To_Do_List

Action Items in the Open Source Malaria Consortium
82 stars 13 forks source link

Time for Greater Migration of OSM to Github? #507

Open mattodd opened 7 years ago

mattodd commented 7 years ago

The time has come to consider a more significant migration of OSM’s core discussion/collaboration/writing activities to Github.

Github’s a powerful platform, many of us are already here, the current wiki isn’t really working (in the sense that people are not editing it) meaning our “story so far” is always badly out of date, we’re writing up our work in papers too slowly and we still don’t have a good place to track suggested molecules on their path to being synthesized.

What I mean by suggesting this is that we need to use Github for discussion around things that need doing, as currently, but also as a place to maintain project status, draft papers, share files and other things. Github’s a super-powerful platform that is well supported and that we’re not really using it properly.

For now I don’t imagine altering the Master List, which seems to be doing its job, or the various lab notebook platforms (Labtrove, Labarchives (e.g.), Luc’s system (#499)) – we can talk about ELNs some other time – an important discussion, but separate. Ditto for how we ensure data flow to other places, e.g. Pubchem.

Success here would be that, after migration: i) more people are involved in maintenance of project status, ii) more people are able to make independent contributions to paper writin iii) we have a general solution to sharing files relevant to the project iv) that we have a more effective means to handle molecule suggestions and the progress towards the synthesis of those molecules.

In a more general sense, fewer things would go directly through me and it would be clearer/easier for people to manage their own project ideas. In short, and taking the broader view, I want it to be easier for someone else to start and run OSM Series 5 or 6 etc.

An outline structure is below. If we decide to move content over we will (first) need help structuring how this should be organized (before we do it) – how the hierarchy works of the various repositories and component/daughter repositories. i.e. the first thing is to make sure that any proposed structure makes sense. When we start moving we’ll need help again in making sure we’re doing things correctly. This would be a learning curve for everyone, I think, but if you're reading this as a Github expert, please share with everyone how hard you think a migration like this would be. I suspect Power Users like @miike @greglandrum @lpatiny and others will encourage us.

We’d aim then to retire use of the OpenWetWare wiki platform.

The kind of structure I have in mind is this:

OSM Main

containing i) About OSM (on front page), ii) wiki describing how OSM works (the rules, and most general ideas), iii) the most important links, iv) FAQs (#415) v) Links to all daughter repositories. Includes a To Do List (i.e. a new version of the Issue Tracker we already use) for all general OSM-wide discussion items.

A daughter of OSM Main is:

OSM Series.

Repositories for all series, i.e. a daughter of that again is

Series 1

Containing:

Publicity/Newsletters/Resources

Breaking Good

information about Uni/school crowdsourcing projects. Actual Projects could then have daughter repositories, which could be managed independently by enterprising schools/unis. Breaking Good is evolving, and it’s likely that this needs its own, separate structure and path (since it’s not a malaria-specific idea), but if we hosted malaria-relevant projects here, we could migrate those somewhere else later.

Tech Ops/How-To’s

How to do things (e.g. recommending Data Warrior, but also rules for compound numbering, etc), and consultations about how we improve the way that we do things.

Glossary

A possible section that de-jargonises everything in the project, and links in with interest people have shown in an online medchem course (#416). e.g. explanation (or links to e.g. @drc007 ’s articles) of terms like hERG, Volume of Distribution, different strains used etc. Jargon is a major barrier to community participation.

Why do all this?

Advantages of migrating:

  1. We’re all already on Github, and many people who might want to help (e.g. in building new capabilities that interface well with Github) are already here. The platform has had the best uptake of anything we’ve used.
  2. Combines writing and wiki and file sharing functions.
  3. Sharing images and files much easier than on wiki.
  4. Sophisticated paper writing abilities (e.g. this review in which, for example, contributions could be automatically tracked https://github.com/greenelab/deep-review/graphs/contributors) – paper writing is a major roadblock at the moment – see #443
  5. Has “milestone” function associated with Issues that we could use more. Alice pointed out that we could also use issues more strategically by using them as the basis for more automatically-generated newsletters (which is obviously useful but not happening #269).
  6. Already lots of help available for how to use the platform that we can point people to.

Disadvantages:

  1. Will involve work to migrate content. (this could be work for which we could seek volunteers, award helper badges etc)
  2. There will be some learning for people – e.g. how to clone repositories and submit pull requests. We'd need to make some targeted how-to's for common tasks such as paper editing.
  3. With more content, we will need to make sure that people can navigate a larger amount of content. e.g. people may submit Issues to the wrong list, or not understand where to go to find something.
  4. We’d break links on the Landing Page. But these could be fixed, and pretty much all the content on the Landing Page is in any case imported from primary content on Github.
  5. We’d get more alerts, and these would need managing. It is easy to follow only those repositories of interest, but I suspect there would be even more clamour for a daily digest function (#162).

Alternatives 1) The Open Science Framework. I don’t know it well enough yet. I’m going to try it out on a new open source project that we’ll start in a week or so on mycetoma (exciting, by the way!) but I’m hesitant to commit something as large as OSM to a platform we’ve not yet tried. 2) We duct-tape over our current road blocks by using Google Docs for paper writing and a Google Drive/Dropbox solution for file sharing. I’ve always said that we go with what works for people. That’s true, but I see advantages in reducing the number of components we use where we can to avoid platform/password fatigue.

Please comment and advise. This would affect us all, but on balance a migration would I think lead to good solutions to current blockages. I've always said that it was a higher priority for OSM to execute projects rather than invest time in the platform. We've now published a major paper. It's time we deal with platform limitations.

cdsouthan commented 7 years ago

These plans seem cogent to me, with the caveat that I don't have much direct experience as a precedent. For the Guide to PHARMACOLOGY team we operate an ad hoc blend of BaseCamp, WordPress, DropBox, Google Docs and a local drive backed up by Edinburgh Uni. Definitely sub-optimal but we get by and obviously cogitate new stuff that seems to get community traction (e.g. Slack, Trello). Note though we do have a problem with a key management team member (who shall remain nameless) because folks in his neck of the Cambridge University IT system appear not to be able to do anything officially outside their firewall, upon pain of some significant sanction (ie no DropBox for them).

However, I am of the opinion that for the overdue Series 4 paper we should definitely bypass what would be a de facto roadblock of grappling with nascent master system. AWAK this will need a long time for setting up, learning, bedding down and optimising. Thus in the interim I suggest we do this the old fashioned Google Docs/Drive/DropBox way. Whatever the Journal of choice may be, bless 'em, they will probably want Word/PDF in the end anyway!

holeung commented 7 years ago

Yes, agree completely with @cdsouthan. Can't wait for the perfect solution.

mattodd commented 7 years ago

I agree that waiting until we've migrated would be unnecessary. It'll take a while. But I do think that we learn by doing. Writing up the Series 4 paper on this platform would be an excellent way to test the idea out while at the same time making progress on something we all want to do. I'll suggest something over at #502 .

drc007 commented 7 years ago

As an aside, if anyone thinks of aspects of drug discovery that they think would make a useful addition to the Drug Discovery Resources (http://www.cambridgemedchemconsulting.com/resources/) feel free to let me know.

mattodd commented 7 years ago

Quick question about the above for any Github power users @miike @greglandrum @chuckfitzpatricksf : the above sketch of how things would look involves "daughter" or "nested" repositories. e.g. a Repo for Series (a "Series" = a collection of molecules, e.g. Series 4) and then daughter repos, one for each series of molecules OSM looks at. This organises things well for OSM'ers. BUT is it possible to have this kind of structure? I don't see a way of creating daughter repos within a repo.

It seems like nested repos are a bad idea or one is recommended to use something else. If that's the case, how do we obtain the right kind of hierarchy here for a complex undertaking like OSM, where you need the kind of structure I mention above - a high-level "about/meta", then an area related to all the series, and then a place for each Series. Are we just forced to have one "level" - each project area is a repo, plain and simple? So we'd have repos for Series 1, 2, 3, 4 at the same level of hierarchy as "about" and "publicity" and "tech-ops". Could work. I guess within a repo for "Series 4" we could have a folder that contains everything for "Series 4 paper 1", so the folder structure becomes a kind of hierarchy, to organise everything?

I'm keen to try out a new repo for a series because OSM has a project student on exchange in Sydney who is interesting in doing some writing, and could experiment with i) wiki writing and ii) paper writing.

miike commented 7 years ago

The closest thing (mentioned above) that Git has to this is submodules. Submodules are sort of 'nested' but do have the disadvantage of only referring to an exact commit/state at a point in time rather than just pointing to a repository.

There's a few other options including: