TheBrownLab / PhyloFisher

PhyloFisher is a software package written in Python3 that can be used for the creation, analysis, and visualization of phylogenomic datasets that consist of eukaryotic protein sequences.
MIT License
31 stars 15 forks source link

Getting error when running apply_to_db.py #88

Closed thesamuanels closed 1 year ago

thesamuanels commented 1 year ago

Hi, I am using phylofisher v1.2.9. I have used it once before but I hadn't encounter the following error then.

I am trying to run the command: apply_to_db.py -t 8 -i parasorter_out_Aug.03.2023/ -fi fisher_out_Jul.21.2023/

parasorter_out_Aug.03.2023 includes all parsed tsv files parsed from single gene trees fisher_out_Jul.21.2023 is the output folder from the fisher.py step.

The message I am getting: Traceback (most recent call last): File "/opt/Anaconda3/envs/fisher_1.2.9/bin/apply_to_db.py", line 379, in <module> main() File "/opt/Anaconda3/envs/fisher_1.2.9/bin/apply_to_db.py", line 325, in main new_database(table) File "/opt/Anaconda3/envs/fisher_1.2.9/bin/apply_to_db.py", line 250, in new_database orthologs, paralogs = parse_table(table) File "/opt/Anaconda3/envs/fisher_1.2.9/bin/apply_to_db.py", line 163, in parse_table record = seq_dict[abbrev] KeyError: 'rhaphid2'

"rhaphid2" is the unique ID of a taxon I wanted to add to the database.
I have even tried to exclude rhaphid2 by using the "--to_exclude" flag, but that didn't work either. The unique id looks the same everywhere, and I don't understand why that one in particular should be problematic, especially since there is another one called 'rhaphid1'.

EDIT: I have tried to replace the id with another id in a copied folder, and it doesn't seem like the issue is the specific id name. Additionally, I've just remembered I had issues running forest.py, so back then I used the command: xvfb-run forest.py --local_run -i sgt_out_<DATE>.tar.gz Including the xvfb-run part before the command made it work. I don't know why it didn't work without it, since I have just tried again without, and now it runs smoothly. I wonder if this can be related to the main error in some ways.

Any idea what the issue might be? Thanks

robert-ervin-jones commented 1 year ago

Do you know which gene appy_to_db.py is on when the error occurs?

thesamuanels commented 1 year ago

Do you know which gene appy_to_db.py is on when the error occurs?

I don't know. Is there a way I can see it?

robert-ervin-jones commented 1 year ago

I realized after saying that how difficult it would be. Probably not without editing the source code to print the gene.

robert-ervin-jones commented 1 year ago

Can you grep rhaphid2 in <fisher_dir>/*fas?

thesamuanels commented 1 year ago

Can you grep rhaphid2 in <fisher_dir>/*fas?

Yes, the string 'rhaphid2' is present within the fasta files' headers multiple times, in multiple files.

robert-ervin-jones commented 1 year ago

As I cannot reproduce this error, troubleshooting without your files will be extremely difficult. Do you feel comfortable sharing them with us? If so, my email is robert.ervin.jones@gmail.com.

robert-ervin-jones commented 1 year ago

Issue is because you have underscores in your long name

thesamuanels commented 1 year ago

I have fixed the issue according to your suggestion and it is finally working now. Thank you so much for your help.

Sincerely, Daniele

Il giorno lun 14 ago 2023 alle ore 07:53 Robert E. Jones < @.***> ha scritto:

Closed #88 https://github.com/TheBrownLab/PhyloFisher/issues/88 as completed.

— Reply to this email directly, view it on GitHub https://github.com/TheBrownLab/PhyloFisher/issues/88#event-10087804685, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA67LQB4O7KL4P3FSTNYQBLXVI3YBANCNFSM6AAAAAA3DJX6YU . You are receiving this because you authored the thread.Message ID: @.***>