Closed jorvis closed 8 years ago
Indeed, it looks as though there is too much information in the transcriptome to fit inside the SNAP index with the seed size transrate has chosen.
It looks like there's some weird stuff in this assembly - your longest contig is 166,145 bases which is unfeasibly long. And there are 2.2 billion bases, which is the size of a medium-large genome and much larger than any real transcriptome I've ever seen.
The fix for the SNAP error is probably for us to detect this error inside transrate and then increase the seed size until it works. I'll see if I can get this into the next release.
However, this won't fix the issues with the assembly - how was this generated?
Yes, it's large. I'm in the process of evaluating different merge/reduction tools for transcriptomics data. That file is the combination of several different Trinity assemblies with Velvet/Oases ones with different parameters. Trying to compare here with other reduction tools like TGICL, EvidentialGene, etc.
I'll just note here that (after fixing a kink in the 64-bit index) Salmon was able to process this giant transcriptome without too much trouble ;) --- yay fish!
@jorvis you might take a look at https://github.com/cboursnell/transfuse for merging/reduction - except transrate will need to be able to run on this transcriptome before transfuse can process it.
@Blahah, thanks, I'd be happy to do that. Let me know when I should try a transrate update. Is the snap issue with the entire transcriptome size or would filtering out the silly-long transcripts perhaps fix the issue?
@jorvis it's probably to do with the very large number of transcripts, and what must be a huge amount of redundancy in the file. Quite likely you have many many copies of most transcripts, which means a read might have a very large number of equally likely candidate locations.
The 166,145 base sequence is certainly not a transcript - possibly it's a plastid genome (chloroplast?) or a contaminant. Or it's an artefact. Either way I'd say it's safe to remove it before transrating. Same for anything under the length of two reads put together - transrate ignores these anyway, but they do slow down the aligner. So removing those would help. However, it's the 618,723 contigs between 1 and 10k that are causing the major issue - one way forward would be to de-duplicate these before continuing. You could use CD-HIT-EST
with a 100% ID cutoff, or VSEARCH with the same.
How long are the reads?
OK, I'll look into de-duplication first. The reads are just around 100bp (have been processed with Trimmomatic)
An update. I removed all transcripts under 200bp and over 100,000bp and the ran CD-HIT-EST to remove duplicates. This reduced the 2,027,284 transcripts to 1,796,079. Transrate failed on it again with the same "[ERROR] 2016-04-05 22:31:36 : Failed to build Snap index" message.
So I looked into the code and found this file where the snap index creation was happening:
transrate/lib/app/lib/transrate/snap.rb
I looked at the parameter options there, and found that the command line snap-aligner invocation was something like this:
transrate/bin/snap-aligner index A1_trinity_oases_merged.sizefiltered.nodups.fasta foo_snap_index -s 23 -t16 -bSpace -locationSize 4
I set my own thread and index names here, ran it, and got the same error. Then I increased locationSize from 4 to 5 and this time snap successfully built the index. It took about 10 minutes, and used a max of 52GB of ram while doing it, but it built successfully. The ruby script appears to be attempting to try each locationSize between 4..8, but doesn't seem to actually be doing this. The iteration here is only successful to try a higher value if either the directory doesn't exist or the error matches one specific case, which doesn't seem to be the error I'm getting here.
I've created a pull request with a possible fix here which checks for the error message text I actually got.
I don't see an option in transrate to use a pre-existing snap index, so I'm going to try manually building on the merged_assemblies file and making sure the name matches the expected convention so re-indexing is skipped when I re-run.
Index creation seems to work fine now, but then it fails during the snap step. Going to sleep and think on it.
[ INFO] 2016-04-06 02:02:18 : Contig metrics done in 715 seconds [ INFO] 2016-04-06 02:02:18 : Calculating read diagnostics... [ERROR] 2016-04-06 02:15:31 : Snap failed Welcome to SNAP version 1.0beta.18.
BigAllocator: allocating too much memory, 291281808 > 291281748 SNAP exited with exit code 1 from line 489 of file SNAPLib/BigAlloc.cpp
Where is this SNAPLib path? I don't find it within the release.
that file is part of SNAP itself, a c++ file so it's compile already in the release. It's here on the official SNAP repo.
Did you look at memory usage when running? Could you have run out of RAM?
The latest release of transrate v1.0.3 now includes the latest version of SNAP and Salmon, as well as your fix :).
Please update and try your analysis again. Hopefully this will solve the problem - if not please re-open the issue. Many thanks for your patience :)
OK, so I just tried again with the latest version v1.0.3, and this was the output:
[ INFO] 2016-08-31 23:16:18 : Calculating contig metrics...
[ INFO] 2016-08-31 23:27:36 : Contig metrics:
[ INFO] 2016-08-31 23:27:36 : -----------------------------------
[ INFO] 2016-08-31 23:27:36 : n seqs 1790973
[ INFO] 2016-08-31 23:27:36 : smallest 100
[ INFO] 2016-08-31 23:27:36 : largest 166145
[ INFO] 2016-08-31 23:27:36 : n bases 2238426230
[ INFO] 2016-08-31 23:27:36 : mean len 1249.06
[ INFO] 2016-08-31 23:27:36 : n under 200 7858
[ INFO] 2016-08-31 23:27:36 : n over 1k 618723
[ INFO] 2016-08-31 23:27:36 : n over 10k 7289
[ INFO] 2016-08-31 23:27:36 : n with orf 315711
[ INFO] 2016-08-31 23:27:36 : mean orf percent 29.3
[ INFO] 2016-08-31 23:27:36 : n90 467
[ INFO] 2016-08-31 23:27:36 : n70 1306
[ INFO] 2016-08-31 23:27:36 : n50 2478
[ INFO] 2016-08-31 23:27:36 : n30 4073
[ INFO] 2016-08-31 23:27:36 : n10 7376
[ INFO] 2016-08-31 23:27:36 : gc 0.43
[ INFO] 2016-08-31 23:27:36 : bases n 0
[ INFO] 2016-08-31 23:27:36 : proportion n 0.0
[ INFO] 2016-08-31 23:27:36 : Contig metrics done in 678 seconds
[ INFO] 2016-08-31 23:27:36 : Calculating read diagnostics...
[ WARN] 2016-08-31 23:31:49 : Snap index build failed with n = 4 , increasing +1
/local/scratch/aplysia/transrate/tool/lib/app/lib/transrate/snap.rb:144:in `delete': Directory not empty @ dir_s_rmdir - transrate.merged.assemblies (Errno::ENOTEMPTY)
from /local/scratch/aplysia/transrate/tool/lib/app/lib/transrate/snap.rb:144:in `block in build_index'
from /local/scratch/aplysia/transrate/tool/lib/app/lib/transrate/snap.rb:125:in `loop'
from /local/scratch/aplysia/transrate/tool/lib/app/lib/transrate/snap.rb:125:in `build_index'
from /local/scratch/aplysia/transrate/tool/lib/app/lib/transrate/read_metrics.rb:52:in `run'
from /local/scratch/aplysia/transrate/tool/lib/app/lib/transrate/transrater.rb:98:in `read_metrics'
from /local/scratch/aplysia/transrate/tool/lib/app/lib/transrate/cmdline.rb:508:in `read_metrics'
from /local/scratch/aplysia/transrate/tool/lib/app/lib/transrate/cmdline.rb:404:in `block in analyse_assembly'
from /local/scratch/aplysia/transrate/tool/lib/app/lib/transrate/cmdline.rb:400:in `chdir'
from /local/scratch/aplysia/transrate/tool/lib/app/lib/transrate/cmdline.rb:400:in `analyse_assembly'
from /local/scratch/aplysia/transrate/tool/lib/app/lib/transrate/cmdline.rb:38:in `block (2 levels) in run'
from /local/scratch/aplysia/transrate/tool/lib/app/lib/transrate/cmdline.rb:37:in `zip'
from /local/scratch/aplysia/transrate/tool/lib/app/lib/transrate/cmdline.rb:37:in `block in run'
from /local/scratch/aplysia/transrate/tool/lib/app/lib/transrate/cmdline.rb:32:in `chdir'
from /local/scratch/aplysia/transrate/tool/lib/app/lib/transrate/cmdline.rb:32:in `run'
from /local/scratch/aplysia/transrate/tool/lib/app/bin/transrate:23:in `<main>'
Did you find a fix for this issue @jorvis? I'm getting this same error.
No I didn't, and I stopped trying to use the program. If a solution is forthcoming I might give it another shot.
I'm refreshing this because I'm getting the same issue. I realize it is with SNAP, but do you have any insights @blahah? RAM should not be an issue here, since it was running on a 3 TB node. The number of bases does admittedly seem excessive, but it is a number of assemblies merged together, across a large number of experimental treatments, to see how various methods of assemblies influence downstream inferences.
[ INFO] 2017-08-29 10:39:07 : Loading assembly: /pylon5/mc3bg6p/astuck/rerun/orthofuse/all-fastas-mergedassembly/merged.fasta
[ INFO] 2017-08-29 10:54:15 : Analysing assembly: /pylon5/mc3bg6p/astuck/rerun/orthofuse/all-fastas-mergedassembly/merged.fasta
[ INFO] 2017-08-29 10:54:15 : Results will be saved in /pylon5/mc3bg6p/astuck/rerun/orthofuse/all-fastas-mergedassembly/merged/merged
[ INFO] 2017-08-29 10:54:15 : Calculating contig metrics...
[ INFO] 2017-08-29 11:16:34 : Contig metrics:
[ INFO] 2017-08-29 11:16:34 : -----------------------------------
[ INFO] 2017-08-29 11:16:34 : n seqs 2368436
[ INFO] 2017-08-29 11:16:34 : smallest 201
[ INFO] 2017-08-29 11:16:34 : largest 18766
[ INFO] 2017-08-29 11:16:34 : n bases 2172789003
[ INFO] 2017-08-29 11:16:34 : mean len 917.39
[ INFO] 2017-08-29 11:16:34 : n under 200 0
[ INFO] 2017-08-29 11:16:34 : n over 1k 651133
[ INFO] 2017-08-29 11:16:34 : n over 10k 937
[ INFO] 2017-08-29 11:16:34 : n with orf 701643
[ INFO] 2017-08-29 11:16:34 : mean orf percent 57.12
[ INFO] 2017-08-29 11:16:34 : n90 342
[ INFO] 2017-08-29 11:16:34 : n70 884
[ INFO] 2017-08-29 11:16:34 : n50 1659
[ INFO] 2017-08-29 11:16:34 : n30 2651
[ INFO] 2017-08-29 11:16:34 : n10 4686
[ INFO] 2017-08-29 11:16:34 : gc 0.45
[ INFO] 2017-08-29 11:16:34 : bases n 1219873
[ INFO] 2017-08-29 11:16:34 : proportion n 0.0
[ INFO] 2017-08-29 11:16:34 : Contig metrics done in 1339 seconds
[ INFO] 2017-08-29 11:16:34 : Calculating read diagnostics...
[ WARN] 2017-08-29 11:19:59 : Snap index build failed with n = 4 , increasing +1
/pylon2/mc3bg6p/macmanes/software/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/snap.rb:145:in `delete': Directory not empty @ dir_s_rmdir - merged (Errno::ENOTEMPTY)
from /pylon2/mc3bg6p/macmanes/software/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/snap.rb:145:in `block in build_index'
from /pylon2/mc3bg6p/macmanes/software/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/snap.rb:126:in `loop'
from /pylon2/mc3bg6p/macmanes/software/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/snap.rb:126:in `build_index'
from /pylon2/mc3bg6p/macmanes/software/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/read_metrics.rb:52:in `run'
from /pylon2/mc3bg6p/macmanes/software/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/transrater.rb:98:in `read_metrics'
from /pylon2/mc3bg6p/macmanes/software/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/cmdline.rb:508:in `read_metrics'
from /pylon2/mc3bg6p/macmanes/software/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/cmdline.rb:404:in `block in analyse_assembly'
from /pylon2/mc3bg6p/macmanes/software/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/cmdline.rb:400:in `chdir'
from /pylon2/mc3bg6p/macmanes/software/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/cmdline.rb:400:in `analyse_assembly'
from /pylon2/mc3bg6p/macmanes/software/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/cmdline.rb:38:in `block (2 levels) in run'
from /pylon2/mc3bg6p/macmanes/software/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/cmdline.rb:37:in `zip'
from /pylon2/mc3bg6p/macmanes/software/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/cmdline.rb:37:in `block in run'
from /pylon2/mc3bg6p/macmanes/software/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/cmdline.rb:32:in `chdir'
from /pylon2/mc3bg6p/macmanes/software/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/cmdline.rb:32:in `run'
from /pylon2/mc3bg6p/macmanes/software/transrate-1.0.3-linux-x86_64/lib/app/bin/transrate:23:in `<main>'
make: *** [/pylon5/mc3bg6p/astuck/rerun/orthofuse/all-fastas-mergedassembly/orthotransrate.done] Error 1
did anyone find an answer for this problem?
Hi, I'm trying to run Transrate_v1.0.3 (latest version as far as I'm concerned) on my transcriptome assembly and I'm getting the same issue as @AdamStuckert and @jorvis. Does anyone knows how to fix this?
[ INFO] 2019-11-29 15:48:45 : Loading assembly: /media/raid/raperez/transcriptomes151/data-rafaela/trinityOUT-RMR/Trinity.fasta
[ INFO] 2019-11-29 16:10:24 : Analysing assembly: /media/raid/raperez/transcriptomes151/data-rafaela/trinityOUT-RMR/Trinity.fasta
[ INFO] 2019-11-29 16:10:24 : Results will be saved in /media/raid/raperez/transcriptomes151/data-rafaela/transrate_RMR/Trinity
[ INFO] 2019-11-29 16:10:24 : Calculating contig metrics...
[ INFO] 2019-11-29 16:41:26 : Contig metrics:
[ INFO] 2019-11-29 16:41:26 : -----------------------------------
[ INFO] 2019-11-29 16:41:26 : n seqs 2747791
[ INFO] 2019-11-29 16:41:26 : smallest 165
[ INFO] 2019-11-29 16:41:26 : largest 101616
[ INFO] 2019-11-29 16:41:26 : n bases 1681598978
[ INFO] 2019-11-29 16:41:26 : mean len 611.91
[ INFO] 2019-11-29 16:41:26 : n under 200 1061
[ INFO] 2019-11-29 16:41:26 : n over 1k 375423
[ INFO] 2019-11-29 16:41:26 : n over 10k 1083
[ INFO] 2019-11-29 16:41:26 : n with orf 136230
[ INFO] 2019-11-29 16:41:26 : mean orf percent 41.29
[ INFO] 2019-11-29 16:41:26 : n90 265
[ INFO] 2019-11-29 16:41:26 : n70 463
[ INFO] 2019-11-29 16:41:26 : n50 833
[ INFO] 2019-11-29 16:41:26 : n30 1503
[ INFO] 2019-11-29 16:41:26 : n10 3629
[ INFO] 2019-11-29 16:41:26 : gc 0.44
[ INFO] 2019-11-29 16:41:26 : bases n 0
[ INFO] 2019-11-29 16:41:26 : proportion n 0.0
[ INFO] 2019-11-29 16:41:26 : Contig metrics done in 1862 seconds
[ INFO] 2019-11-29 16:41:26 : Calculating read diagnostics...
[ WARN] 2019-11-29 16:52:29 : Snap index build failed with n = 4 , increasing +1
/home/raperez/sw/TRANSRATE_v1.0.3/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/snap.rb:144:in delete': Directory not empty @ dir_s_rmdir - Trinity (Errno::ENOTEMPTY) from /home/raperez/sw/TRANSRATE_v1.0.3/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/snap.rb:144:in
block in build_index'
from /home/raperez/sw/TRANSRATE_v1.0.3/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/snap.rb:125:in loop' from /home/raperez/sw/TRANSRATE_v1.0.3/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/snap.rb:125:in
build_index'
from /home/raperez/sw/TRANSRATE_v1.0.3/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/read_metrics.rb:52:in run' from /home/raperez/sw/TRANSRATE_v1.0.3/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/transrater.rb:98:in
read_metrics'
from /home/raperez/sw/TRANSRATE_v1.0.3/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/cmdline.rb:508:in read_metrics' from /home/raperez/sw/TRANSRATE_v1.0.3/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/cmdline.rb:404:in
block in analyse_assembly'
from /home/raperez/sw/TRANSRATE_v1.0.3/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/cmdline.rb:400:in chdir' from /home/raperez/sw/TRANSRATE_v1.0.3/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/cmdline.rb:400:in
analyse_assembly'
from /home/raperez/sw/TRANSRATE_v1.0.3/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/cmdline.rb:38:in block (2 levels) in run' from /home/raperez/sw/TRANSRATE_v1.0.3/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/cmdline.rb:37:in
zip'
from /home/raperez/sw/TRANSRATE_v1.0.3/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/cmdline.rb:37:in block in run' from /home/raperez/sw/TRANSRATE_v1.0.3/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/cmdline.rb:32:in
chdir'
from /home/raperez/sw/TRANSRATE_v1.0.3/transrate-1.0.3-linux-x86_64/lib/app/lib/transrate/cmdline.rb:32:in run' from /home/raperez/sw/TRANSRATE_v1.0.3/transrate-1.0.3-linux-x86_64/lib/app/bin/transrate:23:in
Thank you in advance!
I never was able to "fix" this per se @rafinhacp. That said, two suggestions.
Thanks @AdamStuckert. I'll rethink how to go about it, giving that rerun is not working.
Hi, is anyone get a fix for this problem? Have the same issue as @rafinhacp for one sample. Thanks.
My first run of transrate completed, but with errors in the snap portion like this:
Trying to use too many overflow entries. To index this genome, you either need a larger seed size or a larger location size.
Attaching the full run command and output. transrate.txt