@@uniprot with Uniref - Githubissues

gundizalv commented 5 years ago

I had the following trouble running @@uniprot for a sam file obtained from an alignment with a Uniref50 db:

_paladin-plugins.py @@uniprot -i 1G.sam -c Organism "protein names" go comments ec "database(KEGG)" @@write report.txt Gathering SAM data... Fetching entries 0:5000 of 87309... Fetching entries 5000:10000 of 87309... Fetching entries 10000:15000 of 87309... Fetching entries 15000:20000 of 87309... Fetching entries 20000:25000 of 87309... Fetching entries 25000:30000 of 87309... Fetching entries 30000:35000 of 87309... Fetching entries 35000:40000 of 87309... Fetching entries 40000:45000 of 87309... Fetching entries 45000:50000 of 87309... Fetching entries 50000:55000 of 87309... Fetching entries 55000:60000 of 87309... Fetching entries 60000:65000 of 87309... Fetching entries 65000:70000 of 87309... Fetching entries 70000:75000 of 87309... Fetching entries 75000:80000 of 87309... Fetching entries 80000:85000 of 87309... Fetching entries 85000:87309 of 87309... {'': ['']} Traceback (most recent call last): File "/usr/bin/paladin-plugins.py", line 118, in core.main.exec_pipeline(pipeline) File "/opt/paladin-plugins/core/main.py", line 357, in exec_pipeline plugin.callback_main(args) File "/opt/paladin-plugins/plugins/uniprot.py", line 80, in uniprot_main headers = "Count\tAbundance\tQuality (Average)\tQuality (Max)\tUniProtKB\tID\t{0}".format("\t".join(uniprotdata["Entry name"][2:])) KeyError: 'Entry name'

Seems that @@uniprot doesn't work with uniref formatted sam file

ToniWestbrook commented 5 years ago

Hi @gundizalv, just to verify, did you run a prepare or index when you indexed the uniref50? If you ran an index, you'll need to run prepare on the reference and rerun the alignment and the uniprot download. If you did run prepare, would you be able to send me the SAM file in question? Also, if you did run the prepare command, PALADIN does this automatically for the fields you requested above when you use the -o option - no need to run PALADIN-plugins.

Also (unrelated to this issue), you're probably already aware, but aligning against the Uniref50 will be very low resolution for much of the results (unless you wanted to then use the declustering plugin to refine it).

gundizalv commented 5 years ago

@twestbrookunh I ran paladin prepare -r1 -f uniref50.fasta to create the indexed database as you already explainded how to use prepare & index. I used a downloaded uniref50 as I could see that you dont give that option. I use Uniref50 because it has a middle size and my samples are poor referenced. However, If could be easier computationally to prepare an uniref90 I would use that.

Every time I used uniref I couldn't obtain a properly .tsv. It only has uniref numbers and no more. I've put "-o" parameter to obtain the output. Maybe the problem has to do with uniref50. Maybe with uniref90 the story is different. Could you give me a link to a properly prepared uniref90, any that you have?

Here is my .sam file:

https://www.dropbox.com/s/p548i7xpyrkg03b/1G.sam?dl=0

ToniWestbrook / paladin-plugins

@@uniprot with Uniref #2