TheBrownLab / PhyloFisher

PhyloFisher is a software package written in Python3 that can be used for the creation, analysis, and visualization of phylogenomic datasets that consist of eukaryotic protein sequences.
MIT License
31 stars 15 forks source link

apply_to_db.py does not update metadata.tsv file #100

Closed Edouard94 closed 1 year ago

Edouard94 commented 1 year ago

Hello PhyloFisher team,

I have an issue where, after running _apply_todb.py script, the metadata.tsv and _treecolors.tsv files are not updated, when the other folders and files seem to be.

image

Could you give me some insight regarding this issue? Should I update the files manually or use the _selecttaxa.py script, knowing that I want all taxa.

Thank you for your help, Edouard

atice commented 1 year ago

Hi @Edouard94

Thank you for the message. Would you be willing to share your input_metadata.tsv, parsed.tsv files, config_ini, and the command you are running? These should be all we need to diagnose the issue thoroughly.

Alex

Edouard94 commented 1 year ago

Hello Alex,

Thank you for getting back to me so quickly.

I am attaching here the requested files and the commands I used after parsing: https://we.tl/t-Zi9c6dtXnn

apply_to_db.py -i forest_out_date/ -fi fisher_out_date/ -t 10 prep_final_dataset.py matrix_constructor.py -i prep_final_dataset_date -t 10

Thank you for your help.

Best wishes, Edouard

atice commented 1 year ago

Hi @Edouard94

Looking in your forest_out_Oct.03.2023 directory I do not see any _parsed.tsv files. Even if you do not make changes to the ortholog/paralog decisions automatically made by fisher.py the original .tsv files output by forest.py need to be saved {gene}_parsed.tsv after being viewed as is automaticly done by ParaSorter when you click "Save to TSV." Would you go back and either rename your files with a script/by hand/opening and saving in ParaSorter so they fit this format and run apply_db.py again and let us know the results. Our guess is none of the sequences from the taxon you added are in the ortholog and paralog files. Those directories and files were just touched as the script was running though sequences already present in the database.

As for tree_colors.tsv, is this the first time you are using either Microsporidia or IncertaeSedis as taxonomic designations? Could you provide us with more information about what behavior are you expecting?

Alex

atice commented 1 year ago

Hi @Edouard94 ,

Just wanted to follow up to see if this solved your issue or not? Thanks for any update you can provide.

Alex

Edouard94 commented 1 year ago

Hi Alex,

Sorry for my late reply.

Ok yes, I see that I went to fast here at the parsing step. All looked correct so I did not do the manual parsing. But eventually I did and rerun the apply_db.py and constructed my matrix and I had the expected changes and results where my new proteome has been added.

Regarding the tree_colors.tsv I don't really know how to use it with the produced matrix, but yes it would be the first time I use this file with those new taxonomical ranks.

Thank you for your help!

Best, Edouard