Closed macmanes closed 8 years ago
Yeah, I like that idea a lot - it should be pretty easy to do too (I think) - will tackle that tonight/tomorrow
Okay, I created (at least the first revision) of this functionality, with the "prepare" command. I had added the unreviewed UniProt DB in there too (TrEMBL), until I actually ran it and realized it was 10GB big (which would mean huge/slow BWT/SA files). Do you think people would ever use the unreviewed version of UniProt for this kind of work? Should I put it back in or keep it out?
Anyway, right now the only option is the Swiss-Prot version. It will download it, then run the index command on it with the appropriate options. I'll commit the latest version to Github now with the fixes/additions made tonight.
Also, ignore the extra warnings during compilation, I will fix those soon
Why not make swissprot + TrEMBL -r1
. If we wanted to be fancy, we could make a -r2
that took a URL for a fasta file.
I think people will want to use every protein they can get at some point.
Update to this issue - after the last couple months of testing, we've determined that UniProt's SwissProt and UniRef90 are effective references depending on the work being performed. We've added UniRef90 as a standard option to the prepare command, and also added the option to prepare a reference post-indexing in case someone indexes manually and skips the prepare command.
Also, since preparing is a custom tailored process per reference (since some specific cleanup is needed depending on the reference type, e.g. UniRef90), we've kept it specific to these two reference databases for now - but if anyone using PALADIN in the future finds a reference database that works well and you feel it would be a helpful addition for other users, please don't hesitate to let us know. Closing this for now.
Should we make a
paladin prep
that downloads the reference and indexes it automatically. Most people will use Uniprot. Could bepaladin prep --reference uniprot
.Could be
prep
,setup
,prepare
,download
, or something like that..