Closed emmats closed 5 years ago
If you have a FastA, I'd just use TransDecoder (it's different software).
However, I'll look into your specific issue later today and see if I can figure something out.
I'm familiar with transdecoder, but I think I have read in the past that it should not be used with metagenomes.
maybe change the orfs=
line. Based on documentation I might have used select
.
ngless "0.10"
input = fastq('/net/nunn/vol1/emmats/sequencing/geo_metaG/Library_Geoduck_MG_1_$
output = preprocess(input, keep_singles=False) using |read|:
read = substrim(read, min_quality=25)
if len(read) < 45:
discard
contigs = assemble(input)
write(contigs, ofile='contigs.fa')
orfs = select(contigs, is_metagenome=True)
write(contigs, ofile='orf.fna’)
I've tried the following and neither worked. orfs = orf_find(contigs, is_metagenome=True, prots_out=proteins.fa) write(contigs, ofile='orf.fna’)
orfs = orf_find(contigs, is_metagenome=True, prots_out=True) write(contigs, ofile='orf.fna')
Do you get an error message or does nothing happen at all?
On Mon, Dec 31, 2018, 09:25 emmats <notifications@github.com wrote:
I've tried the following and neither worked. orfs = orf_find(contigs, is_metagenome=True, prots_out=proteins.fa) write(contigs, ofile='orf.fna’)
orfs = orf_find(contigs, is_metagenome=True, prots_out=True) write(contigs, ofile='orf.fna')
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/RobertsLab/resources/issues/526#issuecomment-450671643, or mute the thread https://github.com/notifications/unsubscribe-auth/AEThOA6gU4pfGNHTjNKuOjHt3GoQZZQTks5u-kiVgaJpZM4ZljGc .
Yes, but neither of these are what I suggested :)
@sr320 I was hoping you would clarify your suggestion...
@kubu4 I get a fatal error and it does not run
@emmats Its in the code I posted above.. and below
ngless "0.10"
input = fastq('/net/nunn/vol1/emmats/sequencing/geo_metaG/Library_Geoduck_MG_1_$
output = preprocess(input, keep_singles=False) using |read|:
read = substrim(read, min_quality=25)
if len(read) < 45:
discard
contigs = assemble(input)
write(contigs, ofile='contigs.fa')
orfs = select(contigs, is_metagenome=True)
write(contigs, ofile='orf.fna’)
But that doesn't include the prots_out argument that Luis mentioned and that is in the manual. It seems that the only thing that is different is that you changed orf_find to select.
The error message just says ,"Fatal error"?
On Mon, Dec 31, 2018, 09:35 emmats <notifications@github.com wrote:
But that doesn't include the prots_out argument that Luis mentioned and that is in the manual. It seems that the only thing that is different is that you changed orf_find to select.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/RobertsLab/resources/issues/526#issuecomment-450672625, or mute the thread https://github.com/notifications/unsubscribe-auth/AEThOA2B3_BtRUohX2RMFU8TzaNHuij3ks5u-krlgaJpZM4ZljGc .
Exiting after fatal error while loading and running script Script Error ^[[31mError in type-checking (line 11): Bad argument type in function 'orf_f$
^[[0m [Wed 19-12-2018 14:04:52]: # Configuration [Wed 19-12-2018 14:04:52]: download base URL: http://ngless.embl.de/res$ [Wed 19-12-2018 14:04:52]: global data directory: /net/gs/vol3/software$ [Wed 19-12-2018 14:04:52]: user directory: /net/maccoss/vol5/home/emmat$ [Wed 19-12-2018 14:04:52]: user data directory: /net/maccoss/vol5/home/$ [Wed 19-12-2018 14:04:52]: temporary directory: /data/scratch/ssd [Wed 19-12-2018 14:04:52]: keep temporary files: True [Wed 19-12-2018 14:04:52]: create report: True [Wed 19-12-2018 14:04:52]: report directory: /net/nunn/vol1/emmats/sequ$ [Wed 19-12-2018 14:04:52]: color setting: AutoColor [Wed 19-12-2018 14:04:52]: print header: True [Wed 19-12-2018 14:04:52]: subsample: False [Wed 19-12-2018 14:04:52]: verbosity: Normal [Wed 19-12-2018 14:04:52]: search path: [Wed 19-12-2018 14:04:52]: Loading modules...
Correct that is all I changed- does that still give error?
you can try
orfs = select(contigs, is_metagenome=True, prots_out=/path/to/file)
That still gives an error:
Exiting after fatal error while loading and running script Script Error ^[[31mError in type-checking (line 11): Bad argument 'is_metagenome' for function 'select'. This function takes the following arguments: keep_if drop_if paired __oname
Error in type-checking (line 11): Bad argument 'prots_out' for function 'select'. This function takes the following arguments: keep_if drop_if paired __oname
Error in type-checking (line 11): Bad type in function call (function 'select' expects NGLMappedReadSet got NGLString).
just guessing now.... but what error does this produce?
ngless "0.10"
input = fastq('/net/nunn/vol1/emmats/sequencing/geo_metaG/Library_Geoduck_MG_1_$
output = preprocess(input, keep_singles=False) using |read|:
read = substrim(read, min_quality=25)
if len(read) < 45:
discard
contigs = assemble(input)
write(contigs, ofile='contigs.fna')
orfs = orf_find(contigs)
write(contigs, ofile='orfs.fna')
It looks like your error messages are cut off. Can you please post the full message from your original script?
Also, have you posted the full script? The error message indicates a problem at line 11, but your original script only has 10 lines.
It may have been the wrong error messages. I'm being lazy and not saving all of them for each attempt.
@sr320 why leave out the is_metagenome argument? Here is the error for that: gcc/4.9.1(9):ERROR:151: Module 'gcc/4.9.1' depends on one of the module(s) 'gmp/5.0.2' gcc/4.9.1(9):ERROR:102: Tcl command execution failed: prereq gmp/5.0.2
gcc/8.1.0(5):ERROR:150: Module 'gcc/8.1.0' conflicts with the currently loaded module(s) 'mpc/1.1.0' gcc/8.1.0(5):ERROR:102: Tcl command execution failed: conflict mpc
Exiting after fatal error while loading and running script Script Error ^[[31mError on line 11: Function orf_find requires argument is_metagenome. ^[[0m [Mon 31-12-2018 11:15:45]: # Configuration [Mon 31-12-2018 11:15:45]: download base URL: http://ngless.embl.de/resources/ [Mon 31-12-2018 11:15:45]: global data directory: /net/gs/vol3/software/modules-sw/ngless/0.10.0/Linux/RHEL6/x86_64/bin/../share/n$ [Mon 31-12-2018 11:15:45]: user directory: /net/maccoss/vol5/home/emmats/.local/share/ngless [Mon 31-12-2018 11:15:45]: user data directory: /net/maccoss/vol5/home/emmats/.local/share/ngless/data [Mon 31-12-2018 11:15:45]: temporary directory: /data/scratch/ssd [Mon 31-12-2018 11:15:45]: keep temporary files: True [Mon 31-12-2018 11:15:45]: create report: True [Mon 31-12-2018 11:15:45]: report directory: /net/nunn/vol1/emmats/sequencing/geo_metaG/test.ngl2.output_ngless [Mon 31-12-2018 11:15:45]: color setting: AutoColor [Mon 31-12-2018 11:15:45]: print header: True [Mon 31-12-2018 11:15:45]: subsample: False [Mon 31-12-2018 11:15:45]: verbosity: Normal [Mon 31-12-2018 11:15:45]: search path: [Mon 31-12-2018 11:15:45]: Loading modules...
just pieces together stuff from online doc.
what is error for ...
ngless "0.10"
input = fastq('/net/nunn/vol1/emmats/sequencing/geo_metaG/Library_Geoduck_MG_1_$
output = preprocess(input, keep_singles=False) using |read|:
read = substrim(read, min_quality=25)
if len(read) < 45:
discard
contigs = assemble(input)
write(contigs, ofile='contigs.fna')
orfs = orf_find(contigs, is_metagenome=True)
write(contigs, ofile='orfs.fna')
That appears to be my original script. There is no error and it runs fine, it just doesn't give me a protein file, only nucleotides.
But it give you orf.fna?
I get orf.fna and contigs.fa
Here is the beginning of the orf.fna file
k141_1 flag=1 multi=202.0000 len=317 GACTTTAAAGAGTATACGAAAGTCTTAGCTTCTAAAGACATCAAAATTTATAAAGAAATTTTTAAAATTCAGAATAAGCCTATTAAAAATAAAAAAGCTCGAGATTGGAAAAAAATTGATGTCCTTATCAAAAAATTGGATAATAAAATATTATTGGGTAACGTCTATGCTGAGAGATATTTGCATCCAACGGGTTGGAGAAGTTCTTATAAAGACTTAAAAATATGGCTCGACAAATATAATGATCACCCAGACGCTACAAGAATTTCCAGAATAGCATTAAAAAGAAAACCTAACAATGTAAGGAGTCCTAAAGC k141_2 flag=1 multi=4.0000 len=380 AATATCATCTAATAAACCTAAAGGCATATCTTGACTAACACCACCTGGACGTATATATGCTGCATGCATCCGAGCACCGGATACACGTTCATAAAATTCCATTAATTTTTCACGTTCTTCAAATACCCAAAGTAATGGTGTTGTTGCACCAACATCTAAGGCATGTGTTGAAACAGCAAGTAAATGGTTTAAAATACGTGTTATTTCACAAAATAAGACTCTAATATACTGAGCACGTTTTGGGACTGAACAATTTAATAATTTTTCAACCGCAAGAGAATACGCATGCTCTTGTGCCATCATAGAAACGTAATCAAGCCTATCAAAATAAGGCAATGCTTGTATATAGTTTTTATGTTCAATTAATTTTTCAGTACCTC k141_3 flag=1 multi=1809.0000 len=308 GATAGGAGTCCGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTTTTTGCTGCGATTGCGGCGGGATTTGCGACTTTACTGACGCCTTGCGTGTTCCCAATGATTCCGGTAACTATTTCGTATTTCACGAAACGCGCTGAATCAGGTAAAGGCACGCCGTTAGGTAATGCTACGGCTTATGCGAGCGGTATTGTCTTTACATTTGCCGGCATTGGAGTTGGCGCAGCACTAGCGCTGTCTCTTATACACATCTCCGAGCCCACGAGACGGACTCCTATCTCGTATGCCGTCTTCTGCTTGAAAAAA k141_4 flag=1 multi=3.0000 len=315 GGTCCGCGACATGCTCGACAACGCCTATCGCTACCTGATCGACCACGATCGACTGGTGTTCTTCGGCCCCAACTCGATGGTGCTGGGCGGTTTTCGCGACGCCAGCTGACTCGGCCAGTGTTCATCGGGGCGATCAGCGACTCCGGTTGTCCCTCCTGGGCGCGGTACTCGAGATGTCGGCCGCAGCGCGGCTTGCATGATGCAGGCAGGATCGCAGGCTGTGCTTTCGCTGTTGGAAGCGATCCAGCCACCGGGGATGGATCGCCACCACGCGCACCGGAACGCTGGCCAATCCCAGCAATTGCACGATGGCCA k141_5 flag=1 multi=1184.0000 len=300 GAGATAGGAGTCCGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCTTTACAGCTTCAGCTAGAATATCTACACCTTCTAACATTTGAGCACGTGCATTTGCACCAAATTTGACTTCTTTAGCAGCCATTTTATCGTCCTTTCAAATTGTAACGTAATTTTTTAAATAAAAGAATAAATTAACCCATTATTCCCATGATATCGCTTTCTTTCATGATCAATAGATCTTTTCCATCAAGCTTAACTTCAGTACCGGACCATTTTCCAAAAAGTACAACATCACCTTCTTTAACATCTA
how about
ngless "0.10"
input = fastq('/net/nunn/vol1/emmats/sequencing/geo_metaG/Library_Geoduck_MG_1_$
output = preprocess(input, keep_singles=False) using |read|:
read = substrim(read, min_quality=25)
if len(read) < 45:
discard
contigs = assemble(input)
write(contigs, ofile='contigs.fna')
orfs = orf_find(contigs, is_metagenome=True, prots_out='prot.fa')
write(contigs, ofile='orfs.fna')
If you have orf fasta (which you do) it should just be a straight translation...?
a la https://www.ebi.ac.uk/Tools/st/emboss_transeq/
using frame 1 and bacteria...
I know, but wouldn't it be nice to have it all in the same pipeline so that when I am processing all the files I only have to run the command once?
Here is the error for your most recent script suggestion: gcc/4.9.1(9):ERROR:151: Module 'gcc/4.9.1' depends on one of the module(s) 'gmp/5.0.2' gcc/4.9.1(9):ERROR:102: Tcl command execution failed: prereq gmp/5.0.2
gcc/8.1.0(5):ERROR:150: Module 'gcc/8.1.0' conflicts with the currently loaded module(s) 'mpc/1.1.0' gcc/8.1.0(5):ERROR:102: Tcl command execution failed: conflict mpc
Exiting after fatal error while loading and running script Script Error ^[[31mError in type-checking (line 11): Bad argument type in function 'orf_find', variable "prots_out". Expected NGLFilename got NGLString.
^[[0m [Mon 31-12-2018 11:43:15]: # Configuration [Mon 31-12-2018 11:43:15]: download base URL: http://ngless.embl.de/resources/ [Mon 31-12-2018 11:43:15]: global data directory: /net/gs/vol3/software/modules-sw/ngless/0.10.0/Linux/RHEL6/x86_64/bin/../share/ngless/data [Mon 31-12-2018 11:43:15]: user directory: /net/maccoss/vol5/home/emmats/.local/share/ngless [Mon 31-12-2018 11:43:15]: user data directory: /net/maccoss/vol5/home/emmats/.local/share/ngless/data [Mon 31-12-2018 11:43:15]: temporary directory: /data/scratch/ssd [Mon 31-12-2018 11:43:15]: keep temporary files: True [Mon 31-12-2018 11:43:15]: create report: True [Mon 31-12-2018 11:43:15]: report directory: /net/nunn/vol1/emmats/sequencing/geo_metaG/test.ngl2.output_ngless [Mon 31-12-2018 11:43:15]: color setting: AutoColor [Mon 31-12-2018 11:43:15]: print header: True [Mon 31-12-2018 11:43:15]: subsample: False [Mon 31-12-2018 11:43:15]: verbosity: Normal [Mon 31-12-2018 11:43:15]: search path: [Mon 31-12-2018 11:43:15]: Loading modules...
This generally is bothersome
gcc/4.9.1(9):ERROR:151: Module 'gcc/4.9.1' depends on one of the module(s) 'gmp/5.0.2'
gcc/4.9.1(9):ERROR:102: Tcl command execution failed: prereq gmp/5.0.2
gcc/8.1.0(5):ERROR:150: Module 'gcc/8.1.0' conflicts with the currently loaded module(s) 'mpc/1.1.0'
gcc/8.1.0(5):ERROR:102: Tcl command execution failed: conflict mpc
and might or might not be related.
One problem is it does like
'prot.fa'
as per
Script Error
^[[31mError in type-checking (line 11): Bad argument type in function 'orf_find', variable "prots_out". Expected NGLFilename got NGLString.
'prot.fa'
must be a NGLString
I am not sure how to make it a NGLFilename (this is part of the coding).
you could try 'prot.faa'
I got those errors even for the one that ran to completion (without a protein translation argument).
I got the same error when I tried prot.faa
I guess I can just give up and use a tool outside of the NGL pipeline...
I finally had a chance to sit down and look at this. This is your issue:
contigs = assemble(input)
write(contigs, ofile='contigs.fa')
orfs = orf_find(contigs, is_metagenome=True)
write(contigs, ofile='orf.fna’)
The second write command is writing the nucleotide data (contigs
) to your designated output file (orf.fna
).
You need to write the variable orfs
to the output file, like so:
write(orfs, ofile='orf.fna’)
That should get you your protein translations in your orf.fna
file.
Nope, that still yields just a nucleotide sequence file. I think "prots_out" needs to be in there somewhere. You did better than Steven, though, because it did run.
Fix your original script:
orfs = orf_find(contigs, is_metagenome=True, prots_out=True)
write(orfs, ofile='orf.fna')
I have tried that before, but will try again.
It failed immediately.
Exiting after fatal error while loading and running script Script Error ^[[31mError in type-checking (line 11): Bad argument type in function 'orf_find', variable "prots_out". Expected NGLFilename got NGLBool.
Sorry, I took that prots_out=True
from one of your examples above. I just looked at documentation and you need a file path (as indicated in error message). Please try:
orfs = orf_find(contigs, is_metagenome=True, prots_out=proteins.fna)
write(orfs, ofile='orf.fna')
Here is the error for that suggestion:
Exiting after fatal error while loading and running script Script Error ^[[31mParsing error on file '/net/nunn/vol1/emmats/sequencing/geo_metaG/test.ngl2' on line 11 (column 6)
write(contigs, ofile='contigs.fna')
orfs = orf_find(contigs, is_metagenome=True, prots_out=proteins.fna)
--------------^
write(orfs, ofile='orf.fna')
"/net/nunn/vol1/emmats/sequencing/geo_metaG/test.ngl2" (line 11, column 6): unexpected ("/net/nunn/vol1/emmats/sequencing/geo_metaG/test.ngl2" (line 11, column 6),TOperator '=') expecting if (reserved word), discard (reserved word), continue (reserved word), variable, len (reserved word), operator -, not (reserved word), operator (, function call, operator [ or end of input
I tried the same thing but with quotes around 'proteins.fna' and got this error: Exiting after fatal error while loading and running script Script Error ^[[31mError in type-checking (line 11): Bad argument type in function 'orf_find', variable "prots_out". Expected NGLFilename got NGLString.
Can you post full script so we can see what line 11 of the script is?
Actually, never mind. The error message is referencing the contents of your file. Can you post a snippet of your file (/net/nunn/vol1/emmats/sequencing/geo_metaG/test.ngl2
)?
Please post snippet through line 11 and column 6, so we can look at what's generating the error message.
ngless "0.10" input = fastq('/net/nunn/vol1/emmats/sequencing/geo_metaG/Library_Geoduck_MG1$ output = preprocess(input, keep_singles=False) using |read|: read = substrim(read, min_quality=25) if len(read) < 45: discard
contigs = assemble(input) write(contigs, ofile='contigs.fna')
orfs = orf_find(contigs, is_metagenome=True, prots_out=proteins.fna) write(orfs, ofile='orf.fna')
Can you please post the full script (line 2 appears to be truncated)?
Sorry, didn't catch that. Nano is stupid.
ngless "0.10" input = fastq('/net/nunn/vol1/emmats/sequencing/geo_metaG/Library_Geoduck_MG_1_S3_L002_R1_001.fastq') output = preprocess(input, keep_singles=False) using |read|: read = substrim(read, min_quality=25) if len(read) < 45: discard
contigs = assemble(input) write(contigs, ofile='contigs.fna')
orfs = orf_find(contigs, is_metagenome=True, prots_out=proteins.fna) write(orfs, ofile='orf.fna')
Nano is stupid
Use cat
to print contents of a file.
Well, you prefer the easy option. I like to make things hard on myself.
Can you post a snippet of your file (/net/nunn/vol1/emmats/sequencing/geo_metaG/test.ngl2)?
Please post snippet through line 11 and column 6, so we can look at what's generating the error message.
ngless "0.10" input = fastq('/net/nunn/vol1/emmats/sequencing/geo_metaG/Library_Geoduck_MG_1_S3_L002_R1_001.fastq') output = preprocess(input, keep_singles=False) using |read|: read = substrim(read, min_quality=25) if len(read) < 45: discard
contigs = assemble(input) write(contigs, ofile='contigs.fna')
orfs = orf_find(contigs, is_metagenome=True, prots_out=proteins.fna)
Wait. Is /net/nunn/vol1/emmats/sequencing/geo_metaG/test.ngl2
your script?
yes. The contents of which I have now posted twice.
yes. The contents of which I have now posted twice.
It wasn't clear what your script was called, since you had never previously referred to it by name.
Anyway, since it requires a file path, please try this (maybe with single and/or double quotes around the path, too):
orfs = orf_find(contigs, is_metagenome=True, prots_out=/net/nunn/vol1/emmats/sequencing/geo_metaG/proteins.fna)
Below are all the errors I got for altering the script as suggested. I'm going to start harassing Luis some more.
No quotes Exiting after fatal error while loading and running script Script Error "/net/nunn/vol1/emmats/sequencing/geometaG/test.ngl2" (line 11, column 56): unexpected '/' expecting "#", "//", "/*", "{", "'", "\"", digit, "0x", "", letter, "!=", "==", "</>", "<=", "<", ">=", ">", "+", "*", " ", tab, ";", "\r", "\n" or end of input
Single quotes Exiting after fatal error while loading and running script Script Error Error in type-checking (line 11): Bad argument type in function 'orf_find', variable "prots_out". Expected NGLFilename got NGLString.
Double quotes Exiting after fatal error while loading and running script Script Error Error in type-checking (line 11): Bad argument type in function 'orf_find', variable "prots_out". Expected NGLFilename got NGLString.
Alrighty, give this a shot (I feel good about this one):
ngless "0.10"
input = fastq('/net/nunn/vol1/emmats/sequencing/geo_metaG/Library_Geoduck_MG_1_S3_L002_R1_001.fastq')
output = preprocess(input, keep_singles=False) using |read|:
read = substrim(read, min_quality=25)
if len(read) < 45:
discard
contigs = assemble(input)
write(contigs, ofile='contigs.fna')
prots_out = proteins.fna
orfs = orf_find(contigs, is_metagenome=True, prots_out)
Still didn't work... Exiting after fatal error while loading and running script Script Error Parsing error on file '/net/nunn/vol1/emmats/sequencing/geo_metaG/test.ngl2' on line 10 (column 21)
contigs = assemble(input)
write(contigs, ofile='contigs.fna')
prots_out = proteins.fna
-----------------------------^
orfs = orf_find(contigs, is_metagenome=True, prots_out)
"/net/nunn/vol1/emmats/sequencing/geo_metaG/test.ngl2" (line 10, column 21): unexpected ("/net/nunn/vol1/emmats/sequencing/geo_metaG/test.ngl2" (line 10, column 21),TOperator '.') expecting if (reserved word), discard (reserved word), continue (reserved word), variable, len (reserved word), operator -, not (reserved word), operator (, function call, operator [ or end of input
I also tried it with single quotes around 'proteins.fna' Exiting after fatal error while loading and running script Script Error Parsing error on file '/net/nunn/vol1/emmats/sequencing/geo_metaG/test.ngl2' on line 11 (column 6)
write(contigs, ofile='contigs.fna')
prots_out = 'proteins.fna'
orfs = orf_find(contigs, is_metagenome=True, prots_out)
--------------^
"/net/nunn/vol1/emmats/sequencing/geo_metaG/test.ngl2" (line 11, column 6): unexpected ("/net/nunn/vol1/emmats/sequencing/geo_metaG/test.ngl2" (line 11, column 6),TOperator '=') expecting if (reserved word), discard (reserved word), continue (reserved word), variable, len (reserved word), operator -, not (reserved word), operator (, function call, operator [ or end of input
This problem has been fixed by the developer:
https://github.com/ngless-toolkit/ngless/issues/97
but you'll have to install the updated version:
https://github.com/ngless-toolkit/ngless/commit/e8adcc2bb170bd44489e569e4d521c8ca9195958
I've contacted the developer of NGLess with this question, but it has been about a week and he hasn't responded so I'm hoping someone here can help! I have managed to successfully assemble metagenome sequencing reads using NGLess, but I need them translated into protein sequences. This is feasible, but I'm having trouble figuring out how to incorporate the command into the script. I am using NGLess 0.10.0: https://ngless.readthedocs.io/en/latest/
Here is my script:
And here is what Luis has told me about including a protein translation step in the script: If you already have the nucleotides, maybe it's easiest to translate those, but from orf_find, you can specify a file "prots_out="
http://ngless.embl.de/Functions.html#orf_find