aaronmussig / PhyloDM

Efficient calculation of phylogenetic distance matrices.
GNU General Public License v3.0
45 stars 2 forks source link

A PanicException when load newick tree file #15

Closed CNwangbin closed 1 year ago

CNwangbin commented 1 year ago

The error occured when using PhyloDM.load_from_newick_path(tree_path) to load a newick tree file. The error message is thread '<unnamed>' panicked at 'Taxon already exists in the tree: 'Taxon("0.999")'', src/pdm.rs:98:9 note: run withRUST_BACKTRACE=1` environment variable to display a backtrace

PanicException Traceback (most recent call last) /home/wangbin/phy2vec/matrix_factorization.py in () 20 np.savetxt(save_mat_name, embedding, delimiter="\t", fmt="%.4f") 21 np.savetxt(save_labels_name, labels, delimiter="\t", fmt="%s") ---> 23 embed_tree('data/tree.nwk', 200, 'data')

/home/wangbin/phy2vec/matrix_factorization.py in embed_tree(tree_path, n_dims, output_path) 7 def embed_tree(tree_path, n_dims, output_path): 8 # read tree file, format:newick ----> 9 pdm = PhyloDM.load_from_newick_path(tree_path) 10 # obtain pairwise distance matrix and labels 11 mat = pdm.dm(norm=False)

File ~/.local/lib/python3.9/site-packages/phylodm/init.py:27, in PhyloDM.load_from_newick_path(cls, path) 21 """Load a tree from a Newick file. 22 23 Args: 24 path: The path to the Newick file. 25 """ 26 pdm = cls() ---> 27 pdm._rs.load_from_newick_path(path=path) 28 return pdm

PanicException: Taxon already exists in the tree: 'Taxon("0.999")'`

aaronmussig commented 1 year ago

Thanks for raising this issue. It seems that the light_phylogeny crate that Rust uses to parse the Newick file doesn't support the extended Newick format. I've created a new release (3.0.0) that will fall back to DendroPy if Rust is unable to load from a Newick file.

My assumption is that this is the case for your tree. Although this might not be the case, if you are able to provide your Newick file then I can try debug a bit further.

CNwangbin commented 1 year ago

Thanks for your reply. I will send the file to to your email aaronmussig@gmail.com.

aaronmussig commented 1 year ago

Thanks for sending the tree through. I confirmed it is fixed with v3.0.0.

It will be a little bit slower ~40 seconds extra because it needs to use DendroPy to load the tree. The distance matrix calculation will still be the same performance, I will work on fixing this in a future release, but at least a workaround exists.

CNwangbin commented 1 year ago

Thanks, I would like to know the approximate time of your upcoming release. Alternatively, could you provide me with an example code solution now?

aaronmussig commented 1 year ago

You can upgrade the package by running python -m pip install -U phylodm

No need to change your code, it should work as is.

CNwangbin commented 1 year ago

Thanks. I will try do it.

CNwangbin commented 1 year ago

Thanks. This actually helped me to fixed this bug.