ToniWestbrook / paladin

Protein Alignment and Detection Interface
MIT License
60 stars 7 forks source link

paladin prep? #3

Closed macmanes closed 8 years ago

macmanes commented 9 years ago

Should we make a paladin prep that downloads the reference and indexes it automatically. Most people will use Uniprot. Could be paladin prep --reference uniprot.

Could be prep, setup, prepare, download, or something like that..

ToniWestbrook commented 9 years ago

Yeah, I like that idea a lot - it should be pretty easy to do too (I think) - will tackle that tonight/tomorrow

ToniWestbrook commented 9 years ago

Okay, I created (at least the first revision) of this functionality, with the "prepare" command. I had added the unreviewed UniProt DB in there too (TrEMBL), until I actually ran it and realized it was 10GB big (which would mean huge/slow BWT/SA files). Do you think people would ever use the unreviewed version of UniProt for this kind of work? Should I put it back in or keep it out?

Anyway, right now the only option is the Swiss-Prot version. It will download it, then run the index command on it with the appropriate options. I'll commit the latest version to Github now with the fixes/additions made tonight.

ToniWestbrook commented 9 years ago

Also, ignore the extra warnings during compilation, I will fix those soon

macmanes commented 9 years ago

Why not make swissprot + TrEMBL -r1. If we wanted to be fancy, we could make a -r2 that took a URL for a fasta file.

wkthomas commented 9 years ago

I think people will want to use every protein they can get at some point.

ToniWestbrook commented 8 years ago

Update to this issue - after the last couple months of testing, we've determined that UniProt's SwissProt and UniRef90 are effective references depending on the work being performed. We've added UniRef90 as a standard option to the prepare command, and also added the option to prepare a reference post-indexing in case someone indexes manually and skips the prepare command.

Also, since preparing is a custom tailored process per reference (since some specific cleanup is needed depending on the reference type, e.g. UniRef90), we've kept it specific to these two reference databases for now - but if anyone using PALADIN in the future finds a reference database that works well and you feel it would be a helpful addition for other users, please don't hesitate to let us know. Closing this for now.