ToniWestbrook / paladin

Protein Alignment and Detection Interface
MIT License
60 stars 7 forks source link

Paladin align error: Reporting can only be used on prepared indices. #43

Closed fnew closed 4 years ago

fnew commented 4 years ago

Hi, Thank you for this tool, it seems to be just what I have been looking for. I am trying to align metagenomic reads to a custom fasta file of protein sequences (genes). Following the manual's instructions, I used 'paladin index -r3 derep_clust90_proteins.faa' and then tried to use 'paladin align' and received this error:

[M::command_align] Loading the index for reference 'derep_clust90_proteins.faa'... [M::index_load_from_disk] Read 0 ALT contigs [E::command_align] Reporting can only be used on prepared indices.

My understanding of the manual is that running 'paladin index' on my custom reference is enough, but this error suggests I need to use 'paladin prepare' after my index step. According to the help for 'paladin prepare', my input reference database should be either Swiss-Prot or UniRef90, but my reference data are neither. How should I proceed?

ToniWestbrook commented 4 years ago

Hi, glad the tool looks like it might help out! PALADIN can perform two levels of analysis - the first simply aligns your reads to a protein database (can be any format), and the second goes a step further and downloads data from UniProt and creates a tab delimited report.

If you're only interested in alignment, then you can just use the "index" command with your custom reference, and when you align, do not use the '-o' output option, and instead just redirect the stdout to a SAM file. This method won't contact UniProt, and doesn't require a uniprot compatible reference. You can also still get a simple version of the tab delimited report that reports abundance by using '-o', and also specifying '-u 0', which also doesn't contact Uniprot.

If you do need uniprot data, then you would need to (if possible) rewrite the headers of your reference so they begin with the UniProt style ">sp/tr|AccID|KBID " - where the Accession ID and KBID are valid IDs from the UniProt database. And then use the "prepare" command. But again, you'd only need that if you want the full info (like gene name, GO terms, taxonomy, etc). Hope that helps.

fnew commented 4 years ago

Thank you! This worked!