faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
76 stars 48 forks source link

Problem w\ phyluce_assembly_match_contigs_to_probes #203

Closed ClayAssis closed 3 years ago

ClayAssis commented 3 years ago

Hi there I'm having this problem when I hit the contigs against probes. The assembly was made with spades, but with velvet and trinity it didn't work. The files and directories were created correctly in the assembly. I tried different ways of writing the name of the files (Cichlo_leuco65, Cichloleuco65, leuco65) and I still get this error.

Any thoughts?

(base) claydson@claydson:~/Documents$ phyluce_assembly_match_contigs_to_probes --contigs ~/Documents/assembly/contigs --probes ~/Documents/uce-2.5k-probes.fasta --output ~/Documents/match_probes_contigs --log-path ~/Documents/log 2020-10-06 22:19:46,747 - phyluce_assembly_match_contigs_to_probes - INFO - ======= Starting phyluce_assembly_match_contigs_to_probes ======= 2020-10-06 22:19:46,748 - phyluce_assembly_match_contigs_to_probes - INFO - Version: git fatal: not a git repository: '/home/claydson/anaconda2/lib/python2.7/site-packages/.git' 2020-10-06 22:19:46,748 - phyluce_assembly_match_contigs_to_probes - INFO - Argument --contigs: /home/claydson/Documents/assembly/contigs 2020-10-06 22:19:46,748 - phyluce_assembly_match_contigs_to_probes - INFO - Argument --dupefile: None 2020-10-06 22:19:46,748 - phyluce_assembly_match_contigs_to_probes - INFO - Argument --keep_duplicates: None 2020-10-06 22:19:46,748 - phyluce_assembly_match_contigs_to_probes - INFO - Argument --log_path: /home/claydson/Documents/log 2020-10-06 22:19:46,748 - phyluce_assembly_match_contigs_to_probes - INFO - Argument --min_coverage: 80 2020-10-06 22:19:46,748 - phyluce_assembly_match_contigs_to_probes - INFO - Argument --min_identity: 80 2020-10-06 22:19:46,748 - phyluce_assembly_match_contigs_to_probes - INFO - Argument --output: /home/claydson/Documents/match_probes_contigs 2020-10-06 22:19:46,748 - phyluce_assembly_match_contigs_to_probes - INFO - Argument --probes: /home/claydson/Documents/uce-2.5k-probes.fasta 2020-10-06 22:19:46,748 - phyluce_assembly_match_contigs_to_probes - INFO - Argument --regex: ^(uce-\d+)(?:_p\d+.*) 2020-10-06 22:19:46,748 - phyluce_assembly_match_contigs_to_probes - INFO - Argument --verbosity: INFO 2020-10-06 22:19:46,781 - phyluce_assembly_match_contigs_to_probes - INFO - Creating the UCE-match database 2020-10-06 22:19:47,099 - phyluce_assembly_match_contigs_to_probes - INFO - Processing contig data 2020-10-06 22:19:47,099 - phyluce_assembly_match_contigs_to_probes - INFO - ----------------------------------------------------------------- Traceback (most recent call last): File "/home/claydson/anaconda2/bin/phyluce_assembly_match_contigs_to_probes", line 342, in main() File "/home/claydson/anaconda2/bin/phyluce_assembly_match_contigs_to_probes", line 271, in main contigs = contig_count(contig) File "/home/claydson/anaconda2/bin/phyluce_assembly_match_contigs_to_probes", line 176, in contig_count return sum([1 for line in open(contig, 'rU').readlines() if line.startswith('>')]) IOError: [Errno 2] No such file or directory: '/home/claydson/Documents/assembly/contigs/Cichlo_leuco_65.contigs.fasta'

brantfaircloth commented 3 years ago

The error is suggesting that there is not a file at /home/claydson/Documents/assembly/contigs/Cichlo_leuco_65.contigs.fasta (which also suggests the assembly did not work as expected or something else is wrong - like a broken symlink, etc.).

ClayAssis commented 3 years ago

In fact the problem is broken symlink. I reinstalled phyluce and still have this problem. Any idea what it might be?

brantfaircloth commented 3 years ago

You just need to fix the symlink - so create a new symlink with the desired name pointing to the correct location (instead of the incorrect location).

ClayAssis commented 3 years ago

The problem is not symlink per se. The assembly is not creating the "contigs.fasta" file within the directory for each sample. It seems that the assembly is not working. Any idea what it might be? Sorry for the confusion!

brantfaircloth commented 3 years ago

It may be that the assembly program is running out of RAM and dying when you try to run it. You could look at the log files for whichever assembler you tried to see if there is some clue. Alternatively, you can try and run the assemblies against the test data, as in the tutorial, to see if that gives you a problem (if the tutorial data run fine, the issue has something to do with your data).

ClayAssis commented 3 years ago

The test data works fine. So It' s with my data. The illumiprocessor works fine (phyluce_assembly_get_fasta_lengths = Cichlo_leuco_65-READ-singleton.fastq.gz,823062,81831434,99.4231710369,0.00850322063774,40,101,101.0). The strange thing is that no data is working, even those that I've run before. Any guess what it might be or what I can try to do?

brantfaircloth commented 3 years ago

Look for the spades.log file within your assembly folder for a particular sample/taxon - that should give you some pointers. Alternatively, you can try to assembly one of your samples "by hand" using spades and watch to see what's going wrong - usually there are errors during read correction because of RAM use.

spades.py -t <how many cpu cores> -m <your ram in Gb - so `2` if you have 2 Gb> -1 read1.fastq.gz  -2 read2.fastq.gz -o spades-out --careful

If spades dies during read correction, it's because of the amount of data you have. You can either (1) find a computer with more RAM [if that's possible] or (2) try to downsample your reads before trying to assemble them - see here for some advice on how you might do that.

ClayAssis commented 3 years ago

Cool! Many thanks. I'll try here!