Closed Ebedthan closed 2 months ago
Hi @Ebedthan Thank you very much! It sounds great, let's do it.
Are you planning to use the library? If so, for what? This was a project I did for my MSc, so I have slides about it (some diagrams could be useful for docs)
In this project I implemented 3 different algorithms.
Canonical, as done by Quicktree. I think this would be the most useful thing to move to a library. It is pretty standard and, in fact, it does fairly well in performance. During my benchmark, it did always better than Quicktree (implemented in C). For big sizes, N=30.000 sequences tt was around one hour faster.
The RapidNJ algorithm with B-trees, in the same spirit as the RapidNJ program. This algorithm in faster than the canonical (the previous one), for n>11000, but it uses more memory. This rust version is far from being as optimized as the original RapidNJ one (and I think it will never be😂). The B-trees data strcutures are from the standard library, and I think making it faster would require custom implementation, additional tricks and using external memory. A version using Red black trees (change would be straightforward), should be slower but use less memory. I think this would be the second most useful to have in a crate.
A mix of the previous one, that sometimes was faster for intermediate sizes. I don't think this is worthy to put in a library.
That said, I agree it could be handy to have such library! I think that, I practice, you don't make trees for such big values of n. And, if you do, then you are running it separately in a cluster and saving the newick file.
What do you think? By the way, I've checked out your profile, very cool projects!
Yes, I plan to use it. And it would be really useful to have a library that allows us to create NJ trees in rust (we don't have such things right now).
Cool! Do you have any about of how the public API of such library should look like?
Yes, I think of a set of pub fn like:
create_naive_nj(distance: Vec<Vec<f64>>) -> Tree
create_rapid_nj(distance: Vec<Vec<f64>>) -> Tree
could be really useful to start with, Then after some iterations, some useful functions could be added like
to_qmatrix(distance: Vec<Vec<f64>>) -> QMatrix
But I think that such a project should first concentrate on providing the function to create a NJ from a matrix and then some optimization could be done which generally take more iterations.
Hi!
I've found some time today to start working on this. The code is a bit spaghetti. But I got something that kind of works.
As far as I understand, the crate should provide:
1- Read from Phylip matrix (as it is the most common input file to the Distance Matrix data structure) 2- Create your own distance matrix (which is straightforward, just a vec of vecs and a vec of labels)
let dist_mat = DistanceMatrix::new ( ... )
pub enum Algorithm {Canonical, RapidNJ, Hybrid}
let canonical = NeighborJoiningBuilder::new(Canonical);
let tree = canonical.solve(dist_mat);
let rapid = NeighborJoiningBuilder::new(RapidNJ).set_chunk_size(10);
let tree = rapid.solve(dist_mat);
So, as a summary: (1) input struct with read or construct methods, (2) algorithm struct (with builder pattern), (3) function to newick, (4) function to compare trees.
Unless you have a good reason, I would hide the internal data structure (as QMatrix).
I have created a PR. You are more than welcome to help if you want! Also, if you have any suggestion / improvement, please tell me :) I think making such API would be nice. Later, I would like to add the docs and use anyhow (better error handling)
Great work @currocam!
Yes, I agree that the QMatrix will stay as an internal data structure unless we have a good reason. The work you did is already great for a start and we can start with it. I will work on improving the error handling. I would like to go for something like thiserrror which is great for lib error management If you are okay with it.
Congrats on the work already done! I am eager to use it in my future work!
Great! If you are more familiar with this thiserror, then we should definitely go for it.
Hi!
I have created the library and published it. I am afraid I don't have more time to spend in this project right now, but, please, feel free to contribute with the error handling. That would be great. Or any other idea you have about best practices.
You can find the docs here (I made what I hope is a nice API) and the crate page here
I will leave this issue opened until I hear from you :)
Yes, thanks. I have seen the lib page. I will do a PR in the coming days on the error handling. (By the way, you can close this issue).
Best.
Hi @currocam ,
Amazing work done here! It is a nice resource for the community. However, do you think you'd like to create a lib? It would be great as it would be easily incorporated into other crates. I can contribute and write most of the code if you want.
Thanks.