faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
80 stars 49 forks source link

Assembly not working with tutorial data #214

Closed leonvarhan closed 3 years ago

leonvarhan commented 3 years ago

Hello, I am working through the tutorial with the phyluce tutorial data, but I am not able to get the assembly to work. I have tried Trinity, Space, and Abyss. Could this be related to memory space? Any suggestion would be appreciated! Thank you very much.

########## TRINITY ########### 2021-02-03 15:30:29,264 - phyluce_assembly_assemblo_trinity - INFO ---Processing alligator_mississippiensis ---- 2021-02-03 15:30:29,264 - phyluce_assembly_assemblo_trinity - INFO - Finding fastq/fasta files 2021-02-03 15:30:29,266 - phyluce_assembly_assemblo_trinity - INFO - File type is fastq 2021-02-03 15:30:29,267 - phyluce_assembly_assemblo_trinity - INFO - Copying raw read data to /storage/uce-tutorial/trinity-assemblies/alligator_mississippiensis_trinity 2021-02-03 15:30:29,529 - phyluce_assembly_assemblo_trinity - INFO - Combining singleton reads with R1 data 2021-02-03 15:30:29,538 - phyluce_assembly_assemblo_trinity - INFO - Running Trinity.pl for PE data 2021-02-03 15:30:34,596 - phyluce_assembly_assemblo_trinity - WARNING - Did not clean all fastq/fasta files from /storage/uce-tutorial/trinity-assemblies/alligator_mississippiensis_trinity 2021-02-03 15:30:34,596 - phyluce_assembly_assemblo_trinity - INFO - Removing extraneous Trinity files Traceback (most recent call last): File "/opt/miniconda3/envs/phyluce/bin/phyluce_assembly_assemblo_trinity", line 362, in main() File "/opt/miniconda3/envs/phyluce/bin/phyluce_assembly_assemblo_trinity", line 341, in main cleanup_trinity_assembly_folder(output, log) File "/opt/miniconda3/envs/phyluce/bin/phyluce_assembly_assemblo_trinity", line 283, in cleanup_trinity_assembly_folder raise IOError("Neither Trinity.fasta nor trinity.log were found in output.") IOError: Neither Trinity.fasta nor trinity.log were found in output.

########## SPADES ########### 2021-02-03 16:01:15,797 - phyluce_assembly_assemblo_spades - INFO - -------------------- Processing gallus_gallus ------------------- 2021-02-03 16:01:15,797 - phyluce_assembly_assemblo_spades - INFO - Finding fastq/fasta files 2021-02-03 16:01:15,798 - phyluce_assembly_assemblo_spades - INFO - File type is fastq 2021-02-03 16:01:15,798 - phyluce_assembly_assemblo_spades - INFO - Running SPAdes for PE data 2021-02-03 16:01:36,324 - phyluce_assembly_assemblo_spades - WARNING - Did not clean all fastq/fasta files from /storage/uce-tutorial/spades-assemblies/gallus_gallus_spades 2021-02-03 16:01:36,324 - phyluce_assembly_assemblo_spades - INFO - Symlinking assembled contigs into /storage/uce-tutorial/spades-assemblies/contigs 2021-02-03 16:01:36,324 - phyluce_assembly_assemblo_spades - INFO - -------------------- Processing mus_musculus -------------------- 2021-02-03 16:01:36,324 - phyluce_assembly_assemblo_spades - INFO - Finding fastq/fasta files 2021-02-03 16:01:36,325 - phyluce_assembly_assemblo_spades - INFO - File type is fastq 2021-02-03 16:01:36,325 - phyluce_assembly_assemblo_spades - INFO - Running SPAdes for PE data 2021-02-03 16:01:47,241 - phyluce_assembly_assemblo_spades - **WARNING - Did not clean all fastq/fasta files from /storage/uce-tutorial/spades-assemblies/mus_musculus_spades** 2021-02-03 16:01:47,242 - phyluce_assembly_assemblo_spades - INFO - Symlinking assembled contigs into /storage/uce-tutorial/spades-assemblies/contigs 2021-02-03 16:01:47,242 - phyluce_assembly_assemblo_spades - INFO - =====Completed phyluce_assembly_assemblo_spades =======

########## ABYSS ########### 2021-02-03 16:13:10,052 - phyluce_assembly_assemblo_abyss - INFO - ============ Starting phyluce_assembly_assemblo_abyss =========== 2021-02-03 16:13:10,053 - phyluce_assembly_assemblo_abyss - INFO - Version: git fatal: not a git repository: '/opt/miniconda3/envs/phyluce/lib/python2.7/site-packages/.git' 2021-02-03 16:13:10,053 - phyluce_assembly_assemblo_abyss - INFO - Argument --abyss_se: False 2021-02-03 16:13:10,053 - phyluce_assembly_assemblo_abyss - INFO - Argument --clean: False 2021-02-03 16:13:10,053 - phyluce_assembly_assemblo_abyss - INFO - Argument --config: /storage/uce-tutorial/assembly.conf 2021-02-03 16:13:10,053 - phyluce_assembly_assemblo_abyss - INFO - Argument --cores: 12 2021-02-03 16:13:10,053 - phyluce_assembly_assemblo_abyss - INFO - Argument --dir: None 2021-02-03 16:13:10,053 - phyluce_assembly_assemblo_abyss - INFO - Argument --kmer: 31 2021-02-03 16:13:10,053 - phyluce_assembly_assemblo_abyss - INFO - Argument --log_path: None 2021-02-03 16:13:10,053 - phyluce_assembly_assemblo_abyss - INFO - Argument --output: /storage/uce-tutorial/abyss-assemblies 2021-02-03 16:13:10,053 - phyluce_assembly_assemblo_abyss - INFO - Argument --subfolder: 2021-02-03 16:13:10,054 - phyluce_assembly_assemblo_abyss - INFO - Argument --verbosity: INFO 2021-02-03 16:13:10,054 - phyluce_assembly_assemblo_abyss - INFO - Getting input filenames and creating output directories 2021-02-03 16:13:10,054 - phyluce_assembly_assemblo_abyss - INFO - ------------- Processing alligator_mississippiensis ------------- 2021-02-03 16:13:10,054 - phyluce_assembly_assemblo_abyss - INFO - Finding fastq/fasta files 2021-02-03 16:13:10,057 - phyluce_assembly_assemblo_abyss - INFO - File type is fastq 2021-02-03 16:13:10,057 - phyluce_assembly_assemblo_abyss - INFO - Running abyss-pe against data Traceback (most recent call last): File "/opt/miniconda3/envs/phyluce/bin/phyluce_assembly_assemblo_abyss", line 323, in main() File "/opt/miniconda3/envs/phyluce/bin/phyluce_assembly_assemblo_abyss", line 314, in main contigs_file = convert_abyss_contigs_to_velvet(contigs_file) File "/opt/miniconda3/envs/phyluce/bin/phyluce_assembly_assemblo_abyss", line 246, in convert_abyss_contigs_to_velvet seqstring = seq.seq.tostring() AttributeError: 'Seq' object has no attribute 'tostring'

brantfaircloth commented 3 years ago

This error results from a change to biopython... but for nothing to work is strange. The error, in particular, means you need to change line 246 from:

seqstring = seq.seq.tostring()

to

seqstring = str(seq.seq)

More generally, if nothing is working (I would try spades), then it is possible there are RAM issues causing the problem, although those test data are not gigantic. There are VERY small test data files here:

https://github.com/faircloth-lab/phyluce/tree/master/phyluce/tests/test-data

If you want to try those.

leonvarhan commented 3 years ago

Thanks for the quick response. I tried Spades (second block on the message) and it seemed like the run was completed but the fasta files in the contigs directory are empty. I will change line 246 for Abyss, try again, and report back. Thank you!

leonvarhan commented 3 years ago

I figured out the error in SPADES. I looked at the "spades.phyluce.log" in the spades-assemblies directory and it said:

INFO General (memory_limit.cpp : 49) Memory limit set to 2 Gb 72M / 844M ERROR K-mer Counting (kmer_data.cpp : 353) The reads contain too many k-mers to fit into available memory. You need approx. 2.77342GB of free RAM to assemble your dataset

I looked at the phyluce.config file in home/.conda/envs/py27/config/phyluce.conf and it said:

----------------

Advanced

----------------

[headers] trinity:comp\d+_c\d+_seq\d+|c\d+_g\d+_i\d+|TR\d+|c\d+_g\d+_i\d+|TRINITY_DN\d+_c\d+_g\d+i\d+ velvet:node\d+ abyss:node\d+ idba:contig-\d+\d+ spades:NODE_\d+length\d+cov\d+.\d+

[trinity] max_memory:8G kmer_coverage:2

[spades] max_memory:2 cov_cutoff:5

I increased that number to 30 and it worked! Unfortunately, it didn't work for Trinity, but I can move forward now. Thanks

brantfaircloth commented 3 years ago

Right on. I'm working on updating phyluce and will make the change to 4 in this section of the config. Trinity is going away in the next version, so it won't matter.