RobertsLab / resources

https://robertslab.github.io/resources/
18 stars 10 forks source link

translating assembled metagenome contigs in NGLess #526

Closed emmats closed 5 years ago

emmats commented 5 years ago

I've contacted the developer of NGLess with this question, but it has been about a week and he hasn't responded so I'm hoping someone here can help! I have managed to successfully assemble metagenome sequencing reads using NGLess, but I need them translated into protein sequences. This is feasible, but I'm having trouble figuring out how to incorporate the command into the script. I am using NGLess 0.10.0: https://ngless.readthedocs.io/en/latest/

Here is my script:

ngless "0.10"
input = fastq('/net/nunn/vol1/emmats/sequencing/geo_metaG/Library_Geoduck_MG_1_$
output = preprocess(input, keep_singles=False) using |read|:
   read = substrim(read, min_quality=25)
   if len(read) < 45:
     discard
contigs = assemble(input)
write(contigs, ofile='contigs.fa')
orfs = orf_find(contigs, is_metagenome=True)
write(contigs, ofile='orf.fna’)

And here is what Luis has told me about including a protein translation step in the script: If you already have the nucleotides, maybe it's easiest to translate those, but from orf_find, you can specify a file "prots_out="

http://ngless.embl.de/Functions.html#orf_find

emmats commented 5 years ago

Yes, he just responded (so glad we all spent so much time on this):

I was finally looking at this today (because of the holidays, I didn't find the time for it until now).

It was indeed a bug in ngless (I think that some changes we did a while back broke this and we didn't have a test to catch it so it slipped through. Now there is a test).

I did check though and the prots_out is basically the translation of the output (there is a marking on whether there is a stop codon at the end, but that you can also get from the DNA file in the header). Thus, translating the nucleotid output will give you the same thing as prots_out is supposed to.