Closed VadimDu closed 4 years ago
Dear Vadimd,
Thank you very much for reporting this. I fixed the params in the code and will update the conda package in the next weeks.
About the output interpretation, your output file should be something like the following:
#phylogenetic_threshold 0.05
#mutation_rate_threshold 0.05
#total_branch_length ###
#subtree min_dist mean_dist max_dist min_mut mean_mut max_mut distances mutation_rates
Where:
subtree
: is the subtree in newick format with all leaves within the two thresholds
min_dist
: is the minimum phylogenetic distance for that subtree
mean_dist
: is the average phylogenetic distance for that subtree
max_dist
: is the maximum phylogenetic distance for that subtree
min_mut
: is the minimum mutation rate for that subtree
mean_mut
: is the average mutation rate for that subtree
max_mut
: is the maximum mutation rate for that subtree
distances
: is all phylogenetic distances in that subtree
mutation_rates
: is all mutation rates in that subtree
Please let me know if I can help you with anything else.
Many thanks, Francesco
Dear Francesco,
Thank you for the quick response and handling of the issue.
I appreciate your explanation of strain finder output table interpretation. I still might be missing something however.
I assumed the output should be phylogenetic distances and mutations rate between nodes (our genomes) in each subtree to help decide whether each subtree represent a different strain.
My output file have only 1 very long row in newick format (besides the file headers), separated by commas and pipes, without any apparent results columns you have mentioned.
According to your code, the separator in the output should be '\t' by default:
p.add_argument('-s', '--separator', type=str, default='\t', choices=OUTPUT_EXTENSIONS.keys(), help='Specify the separator to use in the output')
However, running phylophlan_strain_finder showing this massage regarding the separator:
-s {;,,, }, --separator {;,,, } Specify the separator to use in the output (default: )
Only the headers in the output file (#subtree"\t"min_dist"\t"mean_dist...) are tab separated. Is there might be some inconsistency in the representation?
Thanks a lot again, Vadimd
Dear Vadimd,
This is strange. To better understand your issue can I ask you to do the following:
--verbose
option and saving the output to a log file and attached it hereMany thanks, Francesco
Hi Francesco, Sure no problem, I guess you meant the phylophlan_strain_finder (not phylophlan_metagenomic), here is the output from the command with --verbose:
phylophlan_strain_finder.py version 3.0.8 (8 May 2020) Command line: /urigo/vadimd/conda_phylophlan3/bin/phylophlan_strain_finder --input phylophlan3_output/RAxML_bestTree.Ecoli_MAG_isolates_good_quality_all_datasets_n450_refined.tre --mutation_rates phylophlan3_output/mutation_rates.tsv --output phylophlan3_output/Ecoli_MAG_isolates_good_quality_all_dataset_n450_UniRef90_95core_strain_finder --verbose Checking for parameters... Arguments: {'input': 'phylophlan3_output/RAxML_bestTree.Ecoli_MAG_isolates_good_quality_all_datasets_n450_refined.tre', 'mutation_rates': 'phylophlan3_output/mutation_rates.tsv', 'p_threshold': 0.05, 'm_threshold': 0.05, 'tree_format': 'newick', 'output': 'phylophlan3_output/Ecoli_MAG_isolates_good_quality_all_dataset_n450_UniRef90_95core_strain_finder', 'overwrite': False, 'separator': '\t', 'verbose': True} Reading mutation_rates table... Root reached, return Clade as root of the subtree Root reached, return GCF008082325.1_isolate_WGS as root of the subtree Root reached, return M198_MAG_assembly as root of the subtree Creating output...
I will send you the output with the results over the email if it's OK.
Thanks a lot, Dani
Dear Dani, yes I meant phylophlan_strain_finder
and not phylophlan_metagenomic
, sorry.
Yes, the output file by email is fine, thank you.
Dear Francesco,
First, thank you for the very useful and configurable tool! Great job. I run into an into a "Namespace" error when tried to use phylophlan_strain_finder tool: _File "/urigo/vadimd/conda_phylophlan3/lib/python3.8/site-packages/phylophlan/phylophlan_strain_finder.py", line 168, in phylophlan_strain_finder check_params(args, args.verbose) File "/urigo/vadimd/conda_phylophlan3/lib/python3.8/site-packages/phylophlan/phylophlan_strain_finder.py", line 114, in check_params if args.p_threshold < 0.0: AttributeError: 'Namespace' object has no attribute 'pthreshold'
In the documentation under "Finding strains in trees" part, you wrote that the thresholds can be tuned using: --phylo_thr and --mutrate_thr , however under "phylophlan_strain_finder.py" you have 2 different arguments instead: --p_threshold P_THRESHOLD and --m_threshold M_THRESHOLD.
I have checked phylophlan_strain_finder.py, you added --phylo_thr and --mutrate_thr as argparse arguments, but in check_params function you checked for:
which are not defined and hence was the error. I have replaced these 2 arguments instead of --phylo_thr and --mutrate_thr in argparse and the script seems to work fine with the default threholds of 0.05.
In addition I wanted to ask you how do you recommend to read/interpretate the output table from this scripts? It's not easy readable in the current output format.
Thank you Vadimd