NCEAS / open-science-codefest

Web site and planning materials for open science conference.
http://nceas.github.io/open-science-codefest
12 stars 10 forks source link

Taming Pathogens #46

Open bkatiemills opened 10 years ago

bkatiemills commented 10 years ago

Only very recently, a mechanism for “adaptive” immune system in bacteria against the viruses infecting them (usually referred to as phages) was discovered. It appears that bacteria can keep a dynamic library of small pieces of phage genomes (spacers) to detect and neutralize phage attacks. This discovery may give some insight in strategies for combating the pathogens that threaten humans worldwide.

Simultaneously, tremendous databases of phage and bacteria genomes have been made openly available via web forms and APIs from NCBI and phagedb. There is a great need and opportunity to build a simple, automated pipeline to extract the relevant data from these databases and build specialized phage datasets to expedite research on understnading and controlling pathogens.

Sidhartha Goyal from the University of Toronto has described a few simple goals for getting started on this project - find them in the issue tracker in our repo.

mbjones commented 10 years ago

@BillMills OK, tagged your project, which looks really interesting. Maybe you could list/link some of the issues you might want to tackle during codefest in this bug so people have an idea what the products might be?

bkatiemills commented 10 years ago

@mbjones sure, see below; more detailed descriptions are in our issue tracker, but here's a quick overview.

sckott commented 10 years ago

Sounds interesting @BillMills - I do love working with data via web APIs, so I'm interested to see if I can be of help

bkatiemills commented 10 years ago

Awesome, @sckott - we've got examples of how the APIs more or less work, and test data too, so we should be able to hit the ground running on this one.

svaksha commented 9 years ago

@BillMills, how can folks who are not physically present at the codefest participate? Do you have a developers mailing list?

bkatiemills commented 9 years ago

Hi @svaksha,

Thanks for getting in touch! The easiest thing for a remote contributor to jump on would be to attack the issues in the tracker: https://github.com/BillMills/phageParser/issues

If you have questions or comments, feel free to open issues there too - I'll have my eye on it for the rest of the conference.

We'll be using our etherpad to host conversation and notes, too: https://etherpad.mozilla.org/OSCF-pathogens

There's nothing on the etherpad yet, but once our session starts we'll all be using that to take notes. No need to wait for us though, jump into that issue tracker and let me know if there's anything you need!

svaksha commented 9 years ago

Hi @BillMills,

Thanks for the reply. I've cloned the repo and am trying to understand the requirements. For bug #1, https://github.com/BillMills/phageParser/issues/1, should the function grep the entire blast-phagesdb.txt file for all the Expect values between 0 and 1 or do you want separate functions for each of the Sequences producing significant alignments?

Also, I had a query about the copyrights. An MIT license is fine but the LICENSE file says 'Copyright (c) 2014 Bill Mills' <-- Will each developer have to sign away the copyrights to the code they wrote? https://github.com/NCEAS/open-science-codefest/wiki/Pathogens, states that you are organizing it on behalf of the Mozilla Science Lab, so why not assign copyrights to the MSL, Mozilla Foundation or some other Org? This part isnt clear, so please clarify.

Thanks, -SVAKSHA ॥ http://about.me/svaksha

bkatiemills commented 9 years ago

Hi @svaksha ,

Good catch! I apologize for the confusion about the License copyright - that was autogenerated by GitHub with the license, and I forgot about it. Anyway, MSL does not in general claim rights to code contributed to third party projects, so I think I'll pass on assigning them that way; for now, I'm assigning the rights to the PI, pending further discussion.

Re: your first question, the former. We just wanted to scrape the entire file, and process some of the information from each match into CSV. There's a stab at this now as discussed in that issue, but it currently doesn't filter for match quality, and needs testing and validation.