faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
76 stars 48 forks source link

Trimming alignment drops all loci #157

Closed heather340 closed 5 years ago

heather340 commented 5 years ago

Hello, I'm having an issue with aligning and trimming my loci for 16 taxa. When just aligning (no-trim), there are no issues besides having too few taxa for some of the loci (N<3). However, when I turn trimming on, then all loci are dropped. I've tried adjusting the window, threshold, and proportion with no luck. I'm not sure if it's due to the large amount of missing data on both ends of the alignment; does using the sliding window remove everything towards the 3' (or 5') end after it encounters a window that does not meet the requirements? We would like to build a SNP dataset, so need the edge trimming before proceeding to the next step.

I have attached the log file, two example fasta file for an aligned, no-trim locus, and a screenshot of the summary stats for the --no-trim alignment.

Thanks in advance for your advice!

Screen Shot 2019-04-30 at 12 28 10 PM

phyluce_align_seqcap_align.log

uce-2185.txt

uce-2054.txt

brantfaircloth commented 5 years ago

I'm not sure what the issue is... both of these trim correctly for me. I took your alignments, renamed them to end with .nexus, and trimmed them with:

phyluce_align_get_trimal_trimmed_alignments_from_untrimmed --alignments alignments --output trimmed --input-format nexus --output-format nexus

This produced the two outputs:

uce-2185.nexus.txt uce-2222.nexus.txt

heather340 commented 5 years ago

Thanks for the fast response! I just took your code (I didn't realize there was a phyluce_align_get_trimal_trimmed_alignments_from_untrimmed) and it looks like it worked beautifully. So, I'll just do it in 2 steps from now on if necessary.

Thanks again for the assistance.

brantfaircloth commented 5 years ago

Cool.

heather340 commented 5 years ago

Hi again, I've been trying to do the alignment step after phasing in the tutorial with this dataset, and am running into issues again. I noticed that the phasing step may not work with data that has been trimmed with trimal - would this be an issue?

My files from the multialign-phased step have files have N's in them; only 100 @ 50% are aligned with the no-trim option as the loci are dropped. If I replace these N's with -, remove them entirely (but gaps are still present..), or run the alignment step as ignoring ambiguous characters, then I get an error of "No records found in handle". I'm happy to email you a file if needed.

Info running last run: 019-05-01 10:04:13,110 - phyluce_align_seqcap_align - INFO - ============== Starting phyluce_align_seqcap_align ============== 2019-05-01 10:04:13,110 - phyluce_align_seqcap_align - INFO - Version: git fatal: Not a git repository: '/users/PAS1390/osu10232/en vs/phyluce/lib/python2.7/site-packages/.git' 2019-05-01 10:04:13,110 - phyluce_align_seqcap_align - INFO - Argument --aligner: mafft 2019-05-01 10:04:13,110 - phyluce_align_seqcap_align - INFO - Argument --ambiguous: False 2019-05-01 10:04:13,111 - phyluce_align_seqcap_align - INFO - Argument --cores: 12 2019-05-01 10:04:13,111 - phyluce_align_seqcap_align - INFO - Argument --fasta: /users/PAS1390/osu10232/UCE/taxon-sets/ingroup/phas ing_step/multialign-bams-phased-reads-ingroup3/fastas/joined_allele_sequences_all_samples_removed.fasta 2019-05-01 10:04:13,111 - phyluce_align_seqcap_align - INFO - Argument --log_path: /users/PAS1390/osu10232/UCE/taxon-sets/ingroup/p hasing_step/log 2019-05-01 10:04:13,111 - phyluce_align_seqcap_align - INFO - Argument --max_divergence: 0.2 2019-05-01 10:04:13,111 - phyluce_align_seqcap_align - INFO - Argument --min_length: 100 2019-05-01 10:04:13,111 - phyluce_align_seqcap_align - INFO - Argument --no_trim: True 2019-05-01 10:04:13,112 - phyluce_align_seqcap_align - INFO - Argument --notstrict: True 2019-05-01 10:04:13,112 - phyluce_align_seqcap_align - INFO - Argument --output: /users/PAS1390/osu10232/UCE/taxon-sets/ingroup/pha sing_step/PHASED-DATA-mafft-nexus-aligned-removed-notrim-ingroup3 2019-05-01 10:04:13,112 - phyluce_align_seqcap_align - INFO - Argument --output_format: nexus 2019-05-01 10:04:13,112 - phyluce_align_seqcap_align - INFO - Argument --proportion: 0.65 2019-05-01 10:04:13,112 - phyluce_align_seqcap_align - INFO - Argument --taxa: 28 2019-05-01 10:04:13,112 - phyluce_align_seqcap_align - INFO - Argument --threshold: 0.65 2019-05-01 10:04:13,112 - phyluce_align_seqcap_align - INFO - Argument --verbosity: INFO 2019-05-01 10:04:13,113 - phyluce_align_seqcap_align - INFO - Argument --window: 20 2019-05-01 10:04:13,113 - phyluce_align_seqcap_align - INFO - Building the locus dictionary 2019-05-01 10:04:13,113 - phyluce_align_seqcap_align - INFO - Removing ALL sequences with ambiguous bases... 2019-05-01 10:04:17,533 - phyluce_align_seqcap_align - WARNING - DROPPED locus uce-111831. Too few taxa (N < 3). 2019-05-01 09:55:54,824 - phyluce_align_seqcap_align - INFO - Aligning with MAFFT 2019-05-01 09:55:54,827 - phyluce_align_seqcap_align - INFO - Alignment begins. 'X' indicates dropped alignments (these are reporte d after alignment) ................................................................................................................................... ................................................................................................................................... ................................................................................................................................... ................................................................................................................................... ................................................................................................................................... .................................................................................Traceback (most recent call last): File "/users/PAS1390/osu10232/envs/phyluce/bin/phyluce_align_seqcap_align", line 255, in main(args) File "/users/PAS1390/osu10232/envs/phyluce/bin/phyluce_align_seqcap_align", line 232, in main alignments = pool.map(align, params) File "/users/PAS1390/osu10232/envs/phyluce/lib/python2.7/multiprocessing/pool.py", line 253, in map return self.map_async(func, iterable, chunksize).get() File "/users/PAS1390/osu10232/envs/phyluce/lib/python2.7/multiprocessing/pool.py", line 572, in get raise self._value ValueError: No records found in handle

brantfaircloth commented 5 years ago

I haven't tried to phase against alignments trimmed with trimal. I suspect this won't work very well.

heather340 commented 5 years ago

Ah ok. I suppose then that brings me back to my first issue then of not being able to edge trim the alignment prior to moving onto the phasing step.

Here's one of the Fasta files:

ingroup-taxa-incomplete-uce.fasta.txt

brantfaircloth commented 5 years ago

I'm not sure why your alignments are being trimmed to the degree that they are. you can run the edge trimming after alignment using phyluce_align_get_trimmed_alignments_from_untrimmed and perhaps that will help you diagnose the issue.

heather340 commented 5 years ago

That one dropped all the loci as well. Is there another setting you'd recommend I play with under the alignment settings? Or is there a way to get around that for the Phasing step? I guess I'm not familiar with why the regular edge trim would work fine but trimal would not.

I tried to remove all -, N, and gaps from the multialign phased fasta file just before the final alignment just in case it would work better with raw, unaligned alleles, but the file just comes out messy with uneven lines throughout. If there is a way to straighten that file out, do you think it could be aligned with no-trim and subsequently trimal?

Thanks!

Ofsm commented 4 years ago

Hi, same problem with the internal trimming step: NO RECORDS FOUND IN HANDLE.... Any solution at this date?

Nadaline commented 3 years ago

Hello @Ofsm and @heather340 after one year I expect that it's worked for you. But if someone has been the same issue, I recommend seeing Mr. Brown's tips on this tutorial: (https://github.com/jasonleebrown/UCE_phyluce_pipeline). I tried the same procedures and it's working for me.