faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
76 stars 48 forks source link

Add missing data designators #238

Closed jkorstia closed 2 years ago

jkorstia commented 2 years ago

Hi Dr. Faircloth,

I am encountering the same error that several others have encountered (issues #237, #139, etc.). I am trying to analyze my phased UCE alignments with STACEY in BEAST as described in Andermann et al 2019. In order to do so, I have tried to use phyluce_align_add_missing_data_designators to ensure that each of my nexus files has an entry for each species, which I think is required for BEAST. When doing so, I get the same error that others have gotten before: multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/jkorstia/conda/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/po ol.py", line 119, in worker result = (True, func(*args, *kwds)) File "/home/jkorstia/conda/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/po ol.py", line 44, in mapstar return list(map(args)) File "/home/jkorstia/conda/envs/phyluce-1.7.1/bin/phyluce_align_add_missing_da ta_designators", line 208, in add_designators missing_character, File "/home/jkorstia/conda/envs/phyluce-1.7.1/bin/phyluce_align_add_missing_da ta_designators", line 163, in add_gaps_to_align local_organisms.remove(new_seq_name) ValueError: list.remove(x): x not in list

Do you have a workaround for this script or another suggestion on how to process my many alignments? I can provide input files if it would be helpful.

Thanks, Jenny

brantfaircloth commented 2 years ago

Hi Jenny,

try using the --verbatim flag and see if that works for you.

-b

jkorstia commented 2 years ago

Hi Dr. Faircloth,

Unfortunately, I have tried it both with the --verbatim flag and without it. The error message seems to be the same either way.

Thanks, Jenny

brantfaircloth commented 2 years ago

hmm, that's odd. works for me in both software tests and on the desktop. can you send me a few alignments, as well as your taxon-set.incomplete.conf and taxon-set.incomplete - I'll try to take a look.

jkorstia commented 2 years ago

Hi Dr. Faircloth,

Here are three of my alignments, my .conf file and my .incomplete file. I added the .txt to the end so that github would allow the upload. Thanks for helping me!

uce-976.nexus.txt uce-971.nexus.txt uce-965.nexus.txt all-taxa-incomplete.incomplete.txt all-taxa-incomplete.conf.txt

brantfaircloth commented 2 years ago

I see what the issue is (it has to do with capitalization in your taxon names - which is something phyluce does not expect). The easiest way for you to fix this quickly is to edit the phyluce_align_add_missing_data_designators, then make lines 158-165 look like this:

            if not verbatim:
                new_seq_name = "_".join(seq.name.split("_")[1:])
                new_align.append(record_formatter(str(seq.seq), new_seq_name))
                local_organisms.remove(new_seq_name)
            else:
                new_seq_name = seq.name.lower()
                new_align.append(record_formatter(str(seq.seq), new_seq_name))
                local_organisms.remove(seq.name)

You can find the file you need to edit by running which phyluce_align_add_missing_data_designators. I'll also add this fix to the repository, but it will take a while for it to make it into a new version.

brantfaircloth commented 2 years ago

The updated version of your file should look like this file: https://github.com/faircloth-lab/phyluce/blob/main/bin/align/phyluce_align_add_missing_data_designators#L158-L165

jkorstia commented 2 years ago

Hello Dr. Faircloth,

I gave that a try, but I'm encountering a new error now. Those modifications allowed the script to make it past lines 158-165, but had an exception in the next loop. Any idea what's causing this new issue?

multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/jkorstia/conda/envs/phyluce-1.7.1/bin/phyluce_align_add_missing_data_designators", line 173, in add_gaps_to_align assert loc in missing[org], "Locus missing" AssertionError: Locus missing

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/jkorstia/conda/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "/home/jkorstia/conda/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/home/jkorstia/conda/envs/phyluce-1.7.1/bin/phyluce_align_add_missing_data_designators", line 210, in add_designators missing_character, File "/home/jkorstia/conda/envs/phyluce-1.7.1/bin/phyluce_align_add_missing_data_designators", line 175, in add_gaps_to_align assert loc in missing["{}*".format(org)], "Locus missing" AssertionError: Locus missing """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/jkorstia/conda/envs/phyluce-1.7.1/bin/phyluce_align_add_missing_data_designators", line 290, in main() File "/home/jkorstia/conda/envs/phyluce-1.7.1/bin/phyluce_align_add_missing_data_designators", line 273, in main results = pool.map(add_designators, work) File "/home/jkorstia/conda/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/jkorstia/conda/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value AssertionError: Locus missing

Thanks, Jenny

brantfaircloth commented 2 years ago

Hard to say. The big issue is that the test files you sent contain tons of alignments for lots of taxa, while you sent few alignments (I understand why - just trying to explain why testing is difficult). To test this scenario specifically, I would have to hand-edit the incomplete and conf files to include only the loci that you sent, which is super tedious.

If you want to zip up all the alignments you are inputting that go with the incomplete and conf files and email me a link to that zip file, I can test your actual scenario (and you won't need to post your data publicly). I'll discard the alignments after testing to make sure a fix works. Otherwise, I'm grasping at straws, because the above fixes the same theoretical problem in the test data that I use for phyluce.

jkorstia commented 2 years ago

Thanks for the context. I would really appreciate it if you are willing to try it out with my alignments because I am totally stumped. I have emailed you the alignments.

Thanks, Jenny

brantfaircloth commented 2 years ago

You are very welcome. I will not be able to get to this until monday, but hope you have a solution for you shortly after that (i'll also send you back the fixed alignments, so you can move forward ASAP).

brantfaircloth commented 2 years ago

Hi Jenny,

I can't seem to find the email you sent... will you send it to [my email - removed]? Thanks!

jkorstia commented 2 years ago
Hi Brant, I just resent the dropbox link to the ***@***.***  Here is the link just in case [removed] Thanks,Jenny Sent from Mail for Windows 10 From: Brant FairclothSent: Monday, July 19, 2021 9:23 AMTo: faircloth-lab/phyluceCc: jkorstia; AuthorSubject: Re: [faircloth-lab/phyluce] Add missing data designators (#238) Hi Jenny,I can't seem to find the email you sent... will you send it to ***@***.***? Thanks!—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or unsubscribe. 
brantfaircloth commented 2 years ago

If you add the flag --no-check-missing to the other flags you are using, things should work ok. I just ran:

phyluce_align_add_missing_data_designators \
    --alignments trimmed_clean_75 \
    --output test-out \
    --match-count-output all-taxa-incomplete.conf \
    --incomplete-matrix all-taxa-incomplete.incomplete \
    --verbatim \
    --no-check-missing
jkorstia commented 2 years ago

This fix did the job. Thanks Dr. Faircloth!

brantfaircloth commented 2 years ago

Right on 👍. You’re welcome.