faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
78 stars 49 forks source link

Internal trimming step problem #188

Closed Ofsm closed 3 years ago

Ofsm commented 4 years ago

Hi, I'm working with my own data, but I reach this point where I'm stuck, someone have any idea about this error?

(phyluce) oscarsaenz@oscarpc:~/uce-tutorial_subsample/taxon-sets/all$ phyluce_align_seqcap_align \

--fasta all-taxa-incomplete.fasta \
--output mafft-nexus-internal-trimmed \
--taxa 20 \
--aligner mafft \
--cores 2 \
--incomplete-matrix \
--output-format fasta \
--no-trim \
--log-path log

2020-03-12 07:42:26,835 - phyluce_align_seqcap_align - INFO - ============== Starting phyluce_align_seqcap_align ============== 2020-03-12 07:42:26,835 - phyluce_align_seqcap_align - INFO - Version: 1.6.8 2020-03-12 07:42:26,835 - phyluce_align_seqcap_align - INFO - Argument --aligner: mafft 2020-03-12 07:42:26,836 - phyluce_align_seqcap_align - INFO - Argument --ambiguous: False 2020-03-12 07:42:26,836 - phyluce_align_seqcap_align - INFO - Argument --cores: 2 2020-03-12 07:42:26,836 - phyluce_align_seqcap_align - INFO - Argument --fasta: /home/oscarsaenz/uce-tutorial_subsample/taxon-sets/all/all-taxa-incomplete.fasta 2020-03-12 07:42:26,836 - phyluce_align_seqcap_align - INFO - Argument --log_path: /home/oscarsaenz/uce-tutorial_subsample/taxon-sets/all/log 2020-03-12 07:42:26,836 - phyluce_align_seqcap_align - INFO - Argument --max_divergence: 0.2 2020-03-12 07:42:26,836 - phyluce_align_seqcap_align - INFO - Argument --min_length: 100 2020-03-12 07:42:26,836 - phyluce_align_seqcap_align - INFO - Argument --no_trim: True 2020-03-12 07:42:26,837 - phyluce_align_seqcap_align - INFO - Argument --notstrict: True 2020-03-12 07:42:26,837 - phyluce_align_seqcap_align - INFO - Argument --output: /home/oscarsaenz/uce-tutorial_subsample/taxon-sets/all/mafft-nexus-internal-trimmed 2020-03-12 07:42:26,837 - phyluce_align_seqcap_align - INFO - Argument --output_format: fasta 2020-03-12 07:42:26,837 - phyluce_align_seqcap_align - INFO - Argument --proportion: 0.65 2020-03-12 07:42:26,837 - phyluce_align_seqcap_align - INFO - Argument --taxa: 20 2020-03-12 07:42:26,837 - phyluce_align_seqcap_align - INFO - Argument --threshold: 0.65 2020-03-12 07:42:26,837 - phyluce_align_seqcap_align - INFO - Argument --verbosity: INFO 2020-03-12 07:42:26,837 - phyluce_align_seqcap_align - INFO - Argument --window: 20 2020-03-12 07:42:26,838 - phyluce_align_seqcap_align - INFO - Building the locus dictionary 2020-03-12 07:42:26,838 - phyluce_align_seqcap_align - INFO - Removing ALL sequences with ambiguous bases... 2020-03-12 07:42:30,009 - phyluce_align_seqcap_align - WARNING - DROPPED locus uce-502. Too few taxa (N < 3). 2020-03-12 07:42:30,009 - phyluce_align_seqcap_align - WARNING - DROPPED locus uce-509. Too few taxa (N < 3). 2020-03-12 07:42:30,009 - phyluce_align_seqcap_align - WARNING - DROPPED locus uce-4275. Too few taxa (N < 3). 2020-03-12 07:42:30,009 - phyluce_align_seqcap_align - WARNING - DROPPED locus uce-4073. Too few taxa (N < 3). 2020-03-12 07:42:30,009 - phyluce_align_seqcap_align - WARNING - DROPPED locus uce-4078. Too few taxa (N < 3). ---------------------------------------#450 times the same----------------- 2020-03-12 07:42:30,110 - phyluce_align_seqcap_align - WARNING - DROPPED locus uce-4473. Too few taxa (N < 3). 2020-03-12 07:42:30,149 - phyluce_align_seqcap_align - INFO - Aligning with MAFFT 2020-03-12 07:42:30,154 - phyluce_align_seqcap_align - INFO - Alignment begins. 'X' indicates dropped alignments (these are reported after alignment) ....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Traceback (most recent call last): File "/home/oscarsaenz/anaconda3/envs/phyluce/bin/phyluce_align_seqcap_align", line 255, in main(args) File "/home/oscarsaenz/anaconda3/envs/phyluce/bin/phyluce_align_seqcap_align", line 232, in main alignments = pool.map(align, params) File "/home/oscarsaenz/anaconda3/envs/phyluce/lib/python2.7/multiprocessing/pool.py", line 253, in map return self.map_async(func, iterable, chunksize).get() File "/home/oscarsaenz/anaconda3/envs/phyluce/lib/python2.7/multiprocessing/pool.py", line 572, in get raise self._value ValueError: No records found in handle

I'm using Ubuntu 18.04

Many thanks for the help! Oscar

brantfaircloth commented 4 years ago

For whatever reason, it looks like your alignments are getting dropped because there are too few taxa associated with each UCE locus.

Ofsm commented 4 years ago

Hi Brant, thanks for the superfast answer. Well, I dont know what is happen, can be a problem with my data? I got these results in the previous steps:

(phyluce) oscarsaenz@oscarpc:~/uce-tutorial_subsample/taxon-sets/all$ for i in exploded-fastas/*.fasta;

do phyluce_assembly_get_fasta_lengths --input $i --csv; done Acantholachesilla-sp.unaligned.fasta,739,283198,383.217861976,8.67805478834,110,1527,317.0,21 Anomolachesilla-palaciosi.unaligned.fasta,762,317000,416.010498688,9.69739251905,104,2147,349.5,29 Anomopsocus-sp.unaligned.fasta,733,328520,448.185538881,11.6680216935,111,2956,351.0,45 Dagualachesilla-anchicayaensis.unaligned.fasta,679,214688,316.182621502,7.43209343764,111,1399,258.0,9 Dagualachesilloides-caliensis.unaligned.fasta,616,192999,313.310064935,7.02013067239,111,1114,260.5,2 Eolachesilla-chilensis.unaligned.fasta,736,674283,916.145380435,23.4403930215,103,4172,782.5,270 Graphocaeciliini-gen1nov.unaligned.fasta,807,398418,493.70260223,11.6277372847,109,3358,420.0,62 Graphocaeciliini-gen2nov.unaligned.fasta,802,333393,415.701995012,9.17674090898,111,1901,352.5,23 Graphocaeciliini-gennov.unaligned.fasta,785,353447,450.250955414,13.2290787008,110,5942,370.0,40 Graphocaecilius-interpretatus.unaligned.fasta,653,212742,325.791730475,7.66773182617,101,1484,279.0,10 Hemicaecilius-mockfordi.unaligned.fasta,509,6515719,12801.0196464,566.256501001,197,88352,8212.0,495 Lachesilla-pedicularia.unaligned.fasta,704,3196026,4539.80965909,142.497184901,104,28321,3503.0,650 Lachesilla-picticeps.unaligned.fasta,750,645837,861.116,22.1897403055,103,4583,713.5,249 Lachesilla-punctata.unaligned.fasta,794,602198,758.435768262,23.1618513656,111,8379,612.0,188 Lachesilla-rufa.unaligned.fasta,812,905505,1115.15394089,28.081551036,115,5821,961.5,385 Lachesilla-spghn.unaligned.fasta,575,3872085,6734.06086957,268.662818222,109,59295,4629.0,540 Lachesilla-spmly.unaligned.fasta,728,2244058,3082.49725275,87.1085225522,105,21363,2522.0,629 Lachesilla-spQ.unaligned.fasta,780,868299,1113.20384615,29.7958122209,110,7956,917.5,347 Prolachesilla-sp.unaligned.fasta,853,450210,527.796014068,11.4561084333,118,2011,444.0,81 Waoraniella-jarlinsoni.unaligned.fasta,721,237559,329.485436893,7.32721502444,111,1294,268.0,7

brantfaircloth commented 4 years ago

I'm not entire sure what could be going. if/when you explode the monolithic fasta to individual loci (https://phyluce.readthedocs.io/en/latest/tutorial-one.html#exploding-the-monolithic-fasta-file), what do the resulting files look like? May be something that's off with formatting.

Ofsm commented 4 years ago

the results of exploding of the monolithic fasta are the ones I wrote above. I'm attaching one of the exploded-fastas file. Seems pretty go for me.

Acantholachesilla-sp.unaligned.fasta.txt

brantfaircloth commented 4 years ago

So you get a lot of loci dropped for too few taxa. First question: have you pulled out data from the monolithic file to see how many individuals you get data for (example) uce-4078?

Second, it looks like you're running with trimming turned off... so trimming can't be removing all the data. Are you sure that mafft is installed correctly? I'm not certain what is going on... Do the example data from the tutorial run correctly?

Ofsm commented 4 years ago

for a specific UCE, not yet, I will.

I use mafft and muscle to, both failed. Can I externally align this data (I'm right now using mafft online with the all-taxa-incomplete.fasta)?.. I try to install SATé but seems like the download link is not working anymore.