faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
80 stars 49 forks source link

--clean sometimes fails on Trinity2 assemblies (split from #41) #45

Closed brantfaircloth closed 3 years ago

brantfaircloth commented 8 years ago

Carrying over separate issue from #41.

JETitus commented 8 years ago

Hi Dr. Faircloth,

Thank you for the super quick reply. I am running through the tutorial before jumping into my data.

I am under the assumption that the --output directories is what I denoted for each run, illumiprocessor and phyluce_assembly_assemblo_trinity. Below is what is given using the command ls -alh.

illumiprocessor (directory clean_reads [labelled --output clean-fastq] in the tutorial) ls -alh total 24 drwxr-xr-x 7 jamestmcquillan staff 238B Dec 15 14:16 . drwxr-xr-x 10 jamestmcquillan staff 340B Dec 15 14:38 .. -rw-r--r--@ 1 jamestmcquillan staff 8.0K Dec 15 14:34 .DS_Store drwxr-xr-x 7 jamestmcquillan staff 238B Dec 15 14:25 alligator_mississippiensis drwxr-xr-x 6 jamestmcquillan staff 204B Dec 15 14:13 anolis_carolinensis drwxr-xr-x 6 jamestmcquillan staff 204B Dec 15 14:13 gallus_gallus drwxr-xr-x 6 jamestmcquillan staff 204B Dec 15 14:13 mus_musculus

phyluce_assembly_assemblo_trinity: (directory trinity_assemblies [labelled --output trinity_assemblies]) ls -alh total 16 drwxr-xr-x 5 jamestmcquillan staff 170B Dec 15 15:10 . drwxr-xr-x 10 jamestmcquillan staff 340B Dec 15 14:38 .. -rw-r--r--@ 1 jamestmcquillan staff 6.0K Dec 15 15:10 .DS_Store drwxr-xr-x 9 jamestmcquillan staff 306B Dec 15 14:38 alligator_mississippiensis_trinity drwxr-xr-x 2 jamestmcquillan staff 68B Dec 15 14:38 contigs

Thank you for your time.

Best, James

brantfaircloth commented 8 years ago

hi james,

can I also see the ls -alh output from alligator_mississippiensis_trinity?

JETitus commented 8 years ago

Sure here is that output.

ls -alh total 40 drwxr-xr-x 9 jamestmcquillan staff 306B Dec 15 14:38 . drwxr-xr-x 5 jamestmcquillan staff 170B Dec 15 15:10 .. -rw-r--r-- 1 jamestmcquillan staff 559B Dec 15 14:38 Trinity.timing -rw-r--r-- 1 jamestmcquillan staff 192B Dec 15 14:38 alligator_mississippiensis-READ1.fastq.readcount -rw-r--r-- 1 jamestmcquillan staff 192B Dec 15 14:38 alligator_mississippiensis-READ2.fastq.readcount drwxr-xr-x 2 jamestmcquillan staff 68B Dec 15 14:38 chrysalis -rw-r--r-- 1 jamestmcquillan staff 0B Dec 15 14:38 left.fa -rw-r--r-- 1 jamestmcquillan staff 0B Dec 15 14:38 right.fa -rw-r--r-- 1 jamestmcquillan staff 4.6K Dec 15 14:38 trinity.log

brantfaircloth commented 8 years ago

ok, and now the output from trinity.log? It doesn't look like the assembly is completing (which is why you have no files to cleanup when you run --clean.

JETitus commented 8 years ago

trinity.log.txt

Here is the trinity log file.

The assembly is not completing. It is erroring out, error below.

$ phyluce_assembly_assemblo_trinity \

--conf assembly.conf \ --output trinity_assemblies \ --clean \ --cores 8 2015-12-15 14:38:53,236 - phyluce_assembly_assemblo_trinity - INFO - =========== Starting phyluce_assembly_assemblo_trinity ========== 2015-12-15 14:38:53,236 - phyluce_assembly_assemblo_trinity - INFO - Version: 1.5.0 2015-12-15 14:38:53,236 - phyluce_assembly_assemblo_trinity - INFO - Argument --clean: True 2015-12-15 14:38:53,236 - phyluce_assembly_assemblo_trinity - INFO - Argument --config: /Users/jamestmcquillan/Desktop/uce-Tutorial/assembly.conf 2015-12-15 14:38:53,236 - phyluce_assembly_assemblo_trinity - INFO - Argument --cores: 8 2015-12-15 14:38:53,236 - phyluce_assembly_assemblo_trinity - INFO - Argument --dir: None 2015-12-15 14:38:53,236 - phyluce_assembly_assemblo_trinity - INFO - Argument --log_path: None 2015-12-15 14:38:53,236 - phyluce_assembly_assemblo_trinity - INFO - Argument --min_kmer_coverage: 2 2015-12-15 14:38:53,236 - phyluce_assembly_assemblo_trinity - INFO - Argument --output: /Users/jamestmcquillan/Desktop/uce-Tutorial/trinity_assemblies 2015-12-15 14:38:53,236 - phyluce_assembly_assemblo_trinity - INFO - Argument --subfolder: 2015-12-15 14:38:53,237 - phyluce_assembly_assemblo_trinity - INFO - Argument --verbosity: INFO 2015-12-15 14:38:53,237 - phyluce_assembly_assemblo_trinity - INFO - Getting input filenames and creating output directories 2015-12-15 14:38:53,238 - phyluce_assembly_assemblo_trinity - INFO - ------------- Processing alligator_mississippiensis ------------- 2015-12-15 14:38:53,239 - phyluce_assembly_assemblo_trinity - INFO - Finding fastq/fasta files 2015-12-15 14:38:53,240 - phyluce_assembly_assemblo_trinity - INFO - File type is fastq 2015-12-15 14:38:53,241 - phyluce_assembly_assemblo_trinity - INFO - Copying raw read data to /Users/jamestmcquillan/Desktop/uce-Tutorial/trinity_assemblies/alligator_mississippiensis_trinity 2015-12-15 14:38:53,717 - phyluce_assembly_assemblo_trinity - INFO - Combining singleton reads with R1 data 2015-12-15 14:38:53,750 - phyluce_assembly_assemblo_trinity - INFO - Running Trinity.pl for PE data 2015-12-15 14:38:57,446 - phyluce_assembly_assemblo_trinity - INFO - Removing extraneous Trinity files Traceback (most recent call last): File "/Users/jamestmcquillan/anaconda/bin/phyluce_assembly_assemblo_trinity", line 347, in main() File "/Users/jamestmcquillan/anaconda/bin/phyluce_assembly_assemblo_trinity", line 326, in main cleanup_trinity_assembly_folder(output, log) File "/Users/jamestmcquillan/anaconda/bin/phyluce_assembly_assemblo_trinity", line 276, in cleanup_trinity_assembly_folder raise IOError("Neither Trinity.fasta nor trinity.log were found in output.") IOError: Neither Trinity.fasta nor trinity.log were found in output.

brantfaircloth commented 8 years ago

Ok, it looks like the fastq files are not being converted to fasta files correctly (this explains why you have left.fa and right.fa files of 0 bytes in size). Are you sure that your assembly.conf file is setup correctly? For example, does the file exist at:

/Users/jamestmcquillan/Desktop/uce-Tutorial/trinity_assemblies/alligator_mississippiensis_trinity/alligator_mississippiensis-READ1.fastq.gz

Also, it looks like you are using OSX. Can you tell me which version and how much RAM you have?

-b

JETitus commented 8 years ago

assembly.conf.txt

The assembly.conf exists at the base directory of all this, uce-Tutorial.

I am using OSX El Capitan ver. 10.11.1 (15B42) with 16 GB 1600 MHz DDR3 of ram on this computer.

Best, James

brantfaircloth commented 8 years ago

I am not sure what's going on. Basically, the problem lies in fastool running against your data to convert the fastq files into fasta files (which is one of the first trinity steps). That fails, as you can see in your trinity.log, with:

Tuesday, December 15, 2015: 14:38:56    CMD: /Users/jamestmcquillan/anaconda/bin/trinity-2.0.6/trinity-plugins/fastool/fastool --illumina-trinity --to-fasta /Users/jamestmcquillan/Desktop/uce-Tutorial/trinity_assemblies/alligator_mississippiensis_trinity/alligator_mississippiensis-READ2.fastq >> right.fa 2> /Users/jamestmcquillan/Desktop/uce-Tutorial/trinity_assemblies/alligator_mississippiensis_trinity/alligator_mississippiensis-READ2.fastq.readcount 

bash: line 1:  5004 Trace/BPT trap: 5       /Users/jamestmcquillan/anaconda/bin/trinity-2.0.6/trinity-plugins/fastool/fastool --illumina-trinity --to-fasta /Users/jamestmcquillan/Desktop/uce-Tutorial/trinity_assemblies/alligator_mississippiensis_trinity/alligator_mississippiensis-READ1.fastq >> left.fa 2> /Users/jamestmcquillan/Desktop/uce-Tutorial/trinity_assemblies/alligator_mississippiensis_trinity/alligator_mississippiensis-READ1.fastq.readcount

Thread 1 terminated abnormally: Error, cmd: /Users/jamestmcquillan/anaconda/bin/trinity-2.0.6/trinity-plugins/fastool/fastool --illumina-trinity --to-fasta /Users/jamestmcquillan/Desktop/uce-Tutorial/trinity_assemblies/alligator_mississippiensis_trinity/alligator_mississippiensis-READ1.fastq >> left.fa 2> /Users/jamestmcquillan/Desktop/uce-Tutorial/trinity_assemblies/alligator_mississippiensis_trinity/alligator_mississippiensis-READ1.fastq.readcount  died with ret 34048 at /Users/jamestmcquillan/anaconda/bin/Trinity line 2116.
Use of uninitialized value in array dereference at /Users/jamestmcquillan/anaconda/bin/Trinity line 1211.

Because that fails, everything else fails, which explains your 0 size left.fa and right.fa files as well as the subsequent error messages.

I am not sure what is causing this problem, but RAM issues are one possibility. Running El Capitan is another (phyluce is tested to run on 10.10 but not 10.11, yet). Right now, this seems to be a trinity issue but could also be the result of running the code on OSX 10.11.

That said, check your fastq.gz files in the split-adapter-quality-trimmed folder, as well, to make sure they have content (e.g. they have some file size and a number of reads in them).

JETitus commented 8 years ago

Thank you Dr. Faircloth,

I appreciate all of your time. I will play around with what you said more and get back you if I find solution on this mac.

Cheers, James

shahanderkarabetian commented 8 years ago

Hello Brant and/or James,

We are having the exact same problem here when trying to assemble with trinity. I am wondering if a solution was ever found for this?

It is the same error, on OSX 10.11 using 32 GB RAM. The fastq files in the split-adapter-quality-trimmed folder have content, and assemblies worked with velvet.

Thanks, Shahan

brantfaircloth commented 8 years ago

If it's the same error in trinity.log, that may be a RAM issue. One solution is to switch versions of Trinity (e.g. get an older version and compile it yourself). I actually use an older version, and have updated my ~/.phyluce.conf to use that version:

[trinity]
trinity:$HOME/src/trinityrnaseq_r2013-02-25/Trinity.pl
kmer_coverage:2
jellyfish_memory:24G

I still don't get any problems when running the "new" version of trinity with the tutorial data... which is what makes me think the issue is with too little RAM on some computers (I have 48-64 GB on most of mine).

shahanderkarabetian commented 8 years ago

Thanks Brant. I tried it with a previous version of Trinity, and it seems to get past that first error. However, I still can't run actually assemble because it seems trinity requires linux to compile. I probably should have checked that first! We will probably just have to assemble with trinity on our server, then bring the files back into the phyluce pipeline.

drwhitehouse commented 6 years ago

I'm using the version of Trinity that you are packaging in the new build in bioconda. I see similar problems and not much in the way of output in trinity-assemblies/alligator_mississippiensis_trinity.

I do note the following in trinity.log :

Trinity version: v2.1.1
** NOTE: Latest version of Trinity is Trinity-v2.6.6, and can be obtained at:
        https://github.com/trinityrnaseq/trinityrnaseq/releases

which: no bowtie in (/usr/local/miniconda2/opt/trinity-2.1.1/trinity-plugins/BIN:/usr/local/miniconda2/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin)
which: no bowtie-build in (/usr/local/miniconda2/opt/trinity-2.1.1/trinity-plugins/BIN:/usr/local/miniconda2/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin)
Error, cannot find path to bowtie () or bowtie-build (), which is now needed as part of Chrysalis' readscaffolding step.  If you should choose to not run bowtie, include the --no_bowtie in your Trinity command.
brantfaircloth commented 6 years ago

Try to conda install bowtie and rerun to see if that fixes the error. If so, I'll adjust the package for phyluce. Technically, the Trinity package should also include bowtie, and i'll suggest a change for that package, too.

drwhitehouse commented 6 years ago

It appears that bowtie certainly isn't being pulled in correctly by Trinity, so thanks for that.

With bowtie installed, things get a bit further and now I get a error message which seems to indicate that processing stopped because of a mismatch between gunzip | wc -l and fastool. Please let me know if you think I should raise a separate issue for that. In the meantime I will double check and make sure I've been following all the instruction correctly.

brantfaircloth commented 6 years ago

No, just leave things here. The issue is really in the build of Trinity from bioconda (so a bioconda issue). We may be able to fix by switching to a more recent version of Trinity than the one pulled by default. Can you send me the text of the last issue and/or the log?

drwhitehouse commented 6 years ago

I get this in the Trinity log:

----------------------------------------------------------------------------------
-------------- Trinity Phase 1: Clustering of RNA-Seq Reads  ---------------------
----------------------------------------------------------------------------------

Converting input files. (in parallel)Wednesday, June 20, 2018: 14:46:40 CMD: gunzip -c /opt/tmp/test_phyluce/uce-tutorial/trinity-assemblies/alligator_mississippiensis_trinity/alligator_mississippiensis-READ1.fastq.gz | fastool --illumina-trinity --to-fasta >> left.fa 2> /opt/tmp/test_phyluce/uce-tutorial/trinity-assemblies/alligator_mississippiensis_trinity/alligator_mississippiensis-READ1.fastq.gz.readcount
Wednesday, June 20, 2018: 14:46:40      CMD: gunzip -c /opt/tmp/test_phyluce/uce-tutorial/trinity-assemblies/alligator_mississippiensis_trinity/alligator_mississippiensis-READ2.fastq.gz | fastool --illumina-trinity --to-fasta >> right.fa 2> /opt/tmp/test_phyluce/uce-tutorial/trinity-assemblies/alligator_mississippiensis_trinity/alligator_mississippiensis-READ2.fastq.gz.readcount

gzip: stdout: Broken pipe
Thread 1 terminated abnormally: Error, counts of reads in FQ: 1705959 (as per gunzip -c /opt/tmp/test_phyluce/uce-tutorial/trinity-assemblies/alligator_mississippiensis_trinity/alligator_mississippiensis-READ1.fastq.gz | wc -l) doesn't match fastool's report of FA records: 1573739  at /usr/local/miniconda2/bin/Trinity line 3060 thread 1.
        main::ensure_complete_FQtoFA_conversion("gunzip -c /opt/tmp/test_phyluce/uce-tutorial/trinity-assembli"..., "/opt/tmp/test_phyluce/uce-tutorial/trinity-assemblies/alligat"...) called at /usr/local/miniconda2/bin/Trinity line 2099 thread 1
        main::prep_seqs(ARRAY(0x1292500), "fq", "left", undef) called at /usr/local/miniconda2/bin/Trinity line 1310 thread 1
        eval {...} called at /usr/local/miniconda2/bin/Trinity line 1310 thread 1
-conversion of 1573403 from FQ to FA format succeeded.
Trinity run failed. Must investigate error above.

I'm trying to work through the docs here:

http://phyluce.readthedocs.io/en/latest/tutorial-one.html so that is where I got the data from.

brantfaircloth commented 6 years ago

Something appears off with gzip/the config file and/or the reads going into the assembly - basically what's happening is that Trinity is dying because wc -l reports a different number of reads in the READ1 file from that reported by fastool. It looks like this might be occurring because gzip dies before getting all the way through the read file. I am not sure why that's happening (are you out of disk space? RAM?).

However, I'm also not seeing this error with fresh installs of phyluce in either centos 6 or centos 7 (and Trinity 2.1.1).

drwhitehouse commented 6 years ago

Thanks again for your prompt reply. I will investigate. My build environment is slightly odd in that I am trying to install Phyluce in a Singularity container for use on an HPC cluster. I'm actually trying to implement the tutorial as a kind of test script so that I can be fairly sure that everything is working reliably. I would not be at all surprised if this is the cause.

brantfaircloth commented 6 years ago

One thing to check might be to run the commands in the log independent of Trinity to compare results. That's one of the things I checked: (a) the output of gunzip -c file.fastq.gz | wc -l (divided by 4) versus the output from fastool for the same read files.

On another front, unittests for all the major scripts are coming as part of the Python 3 porting that I am doing - that should also help ensure more reliable (and consistent) operation.

drwhitehouse commented 6 years ago

As far as I can tell those agree with each other. I cant find any gz files under "trinity-assemblies" ( I guess these are extracted / removed by the processing itself) but if I run fastool against the file in "split-adapter-quality-trimmed" I get: Sequences parsed: 1573403

gunzip -c alligator_mississippiensis-READ1.fastq.gz | wc -l yields 6293612 (4 * 1573403).

Don't worry too much about this though Dr. Faircloth - I'll track your progress on the re-write (at least until my user gets a bit more impatient!).

ghost commented 6 years ago

I am getting the same "Neither Trinity.fasta nor trinity.log were found in output." error as well with the tutorial data, so it is a RAM issue with the new Trinity?

brantfaircloth commented 6 years ago

possibly - have a look within the file where data were being assembled (e.g. navigate into your output folder) there should be some files and one of those should be output from the critter it was working on when a problem arose. That file may provide clues.

ghost commented 6 years ago

Hi Dr. Faircloth, my trinity.log error is Java:

Error, Trinity requires access to Java version 1.7. Currently installed version is: openjdk version "1.8.0_121" OpenJDK Runtime Environment (Zulu 8.20.0.5-macosx) (build 1.8.0_121-b15) OpenJDK 64-Bit Server VM (Zulu 8.20.0.5-macosx) (build 25.121-b15, mixed mode)

Should I downgrade Java?

Thanks

brantfaircloth commented 6 years ago

the correct java should be installed as part of the current phyluce package. you could also downgrade or installed and switch to the older version.

ghost commented 6 years ago

I installed the new version of phyluce but trinity is no longer supported on Mac OS. Will find ways to get around this, thank you for your help!

AlejandraPanzera commented 6 years ago

Hello Dr. Faircloth,

I am encountering the same problems as described in previous posts: both left.fa and right.fa files are empty. I am using a MAC with 32 of RAM (v. 10.13.6). The phyluce version I am using is 1.5.0 and the Trinity version is 2.0.6. This is the error that I am getting:

Traceback (most recent call last): File "/Users/alejandrapanzera/miniconda2/bin/phyluce_assembly_assemblo_trinity", line 347, in main() File "/Users/alejandrapanzera/miniconda2/bin/phyluce_assembly_assemblo_trinity", line 326, in main cleanup_trinity_assembly_folder(output, log) File "/Users/alejandrapanzera/miniconda2/bin/phyluce_assembly_assemblo_trinity", line 276, in cleanup_trinity_assembly_folder raise IOError("Neither Trinity.fasta nor trinity.log were found in output.") IOError: Neither Trinity.fasta nor trinity.log were found in output.

which is the same as JETius.

wireless-10-104-12-225:WA01_Batx_trinity alejandrapanzera$ ls -alh total 56 drwxr-xr-x 10 alejandrapanzera staff 320B Sep 6 14:39 . drwxr-xr-x 5 alejandrapanzera staff 160B Sep 6 14:38 .. -rw-r--r--@ 1 alejandrapanzera staff 8.0K Sep 6 14:39 .DS_Store -rw-r--r--@ 1 alejandrapanzera staff 487B Sep 6 14:38 Trinity.timing -rw-r--r-- 1 alejandrapanzera staff 195B Sep 6 14:38 WA01_Batx-READ1.fastq.readcount -rw-r--r--@ 1 alejandrapanzera staff 195B Sep 6 14:38 WA01_Batx-READ2.fastq.readcount drwxr-xr-x 2 alejandrapanzera staff 64B Sep 6 14:38 chrysalis -rw-r--r-- 1 alejandrapanzera staff 0B Sep 6 14:38 left.fa -rw-r--r-- 1 alejandrapanzera staff 0B Sep 6 14:38 right.fa -rw-r--r--@ 1 alejandrapanzera staff 3.7K Sep 6 14:38 trinity.log

Anybody has any idea on how this can (or if this can) be fixed? I read on another thread something about not recognizing the "_trinity" part of the folder name with the output (in my case, "WA01_Batx_trinity"). Maybe that is one of the problems?

I thank you very much in advanced.

Alejandra

brantfaircloth commented 6 years ago

For whatever reason, Trinity is not functioning correctly. The answer may be in the trinity.log file in the output you show above (you could be out of RAM). That said, Trinity is no longer supported on the Mac because it is so hard to deal with. I would suggest trying an alternate assembler on the mac (spades) or trying to get the assemblies to run on linux.

AlejandraPanzera commented 6 years ago

Hello Dr. Faircloth, and thank you very much for the rapid response.

I am attaching my trinity.log file. I would really appreciate if you could take a look at it as I don't recognize the errors, even when I am looking them up in the Trinity file.

If you believe there is no hope, I will follow your advise and use another assembler.

Thank you again!

Alejandra

trinity.log

brantfaircloth commented 6 years ago

fastool, which is one of the programs trinity uses appears to be failing. this may be because you have too little RAM on your computer for the size of the files you are trying to assemble. it is hard to say. you can run the command individually:

fastool --illumina-trinity --to-fasta /Users/alejandrapanzera/Desktop/RapidGenomics_UCE/Trinity_assemblies/WA01_Batx_trinity/WA01_Batx-READ1.fastq >> left.fa 2> /Users/alejandrapanzera/Desktop/RapidGenomics_UCE/Trinity_assemblies/WA01_Batx_trinity/WA01_Batx-READ1.fastq.readcount 

to see if it fails for you when running manually (it should). To determine whether the number of reads may be causing the problem, you could try to run the same command as above, but substitute a smaller file to see if it will run successfully. If the smaller file runs successfully, RAM is probably the issue.

AlejandraPanzera commented 6 years ago

Hello,

I tried with smaller files and get the exact same errors. I will be using another assembler I guess. Thank you for your time and help!

marekborowiec commented 5 years ago

I encountered the same problem in Phyluce 1.6.7 with Trinity 2.1.1 when trying to run tutorial examples and for me this was not a memory issue.

The error is also referenced here: https://github.com/trinityrnaseq/trinityrnaseq/issues/139. It seemed to only be a problem with gzipped files, so someone proposed unzipping files first as a solution. This was apparently addressed by the Trinity team more permanently by moving away from fastool to seqtk in newer versions, which are not available with a conda build of phyluce.

What is bizarre is that the error occurs about 50% of the time for me and I noticed that it does not happen when using a single core. The solution that worked for me was to just unzip all fastq files prior to running phyluce_assembly_assemblo_trinity. One can do this quickly by running a loop in your clean reads directory:

for d in `find . -name "*split*"`; do gunzip $d/*fastq.gz; done
brantfaircloth commented 3 years ago

Trinity is about to be removed entirely from phyluce (in 1.7.0), so closing this.