currocam / speedytree

Canonical and RapidNJ implementations of Neighbor-joining in Rust
MIT License
4 stars 1 forks source link

Plan to create a library? #4

Closed Ebedthan closed 2 months ago

Ebedthan commented 8 months ago

Hi @currocam ,

Amazing work done here! It is a nice resource for the community. However, do you think you'd like to create a lib? It would be great as it would be easily incorporated into other crates. I can contribute and write most of the code if you want.

Thanks.

currocam commented 8 months ago

Hi @Ebedthan Thank you very much! It sounds great, let's do it.

Are you planning to use the library? If so, for what? This was a project I did for my MSc, so I have slides about it (some diagrams could be useful for docs)

In this project I implemented 3 different algorithms.

That said, I agree it could be handy to have such library! I think that, I practice, you don't make trees for such big values of n. And, if you do, then you are running it separately in a cluster and saving the newick file.

What do you think? By the way, I've checked out your profile, very cool projects!

Ebedthan commented 8 months ago

Yes, I plan to use it. And it would be really useful to have a library that allows us to create NJ trees in rust (we don't have such things right now).

currocam commented 8 months ago

Cool! Do you have any about of how the public API of such library should look like?

Ebedthan commented 8 months ago

Yes, I think of a set of pub fn like: create_naive_nj(distance: Vec<Vec<f64>>) -> Tree create_rapid_nj(distance: Vec<Vec<f64>>) -> Tree could be really useful to start with, Then after some iterations, some useful functions could be added like to_qmatrix(distance: Vec<Vec<f64>>) -> QMatrix

But I think that such a project should first concentrate on providing the function to create a NJ from a matrix and then some optimization could be done which generally take more iterations.

currocam commented 8 months ago

Hi!

I've found some time today to start working on this. The code is a bit spaghetti. But I got something that kind of works.

image

As far as I understand, the crate should provide:

1- Read from Phylip matrix (as it is the most common input file to the Distance Matrix data structure) 2- Create your own distance matrix (which is straightforward, just a vec of vecs and a vec of labels)

  1. The two (maybe three) algorithms to solve the actual problem. The interface of the functions is a bit weird. For example, the rapid_nj expects the chunk_size (for parallel processing, if threads are given). I don't know how parallelization works for traits. So, (a) make a simpler API with no parallelization or (b) I think it would be nice to have a NeighborJoiningBuilder so there is a nice interface to tune the algorithm. For example, doing stuff like
let dist_mat = DistanceMatrix::new ( ... )
pub enum Algorithm {Canonical, RapidNJ, Hybrid}
let canonical = NeighborJoiningBuilder::new(Canonical);
let tree = canonical.solve(dist_mat);
let rapid = NeighborJoiningBuilder::new(RapidNJ).set_chunk_size(10);
let tree = rapid.solve(dist_mat);
  1. My implementation of undirected graph to newick might not be the best but it might be handy (in addition to all the functions Petgraph already has)
  2. Also, maybe expose the functions I used for property testing and compute distances between trees.

So, as a summary: (1) input struct with read or construct methods, (2) algorithm struct (with builder pattern), (3) function to newick, (4) function to compare trees.

Unless you have a good reason, I would hide the internal data structure (as QMatrix).

I have created a PR. You are more than welcome to help if you want! Also, if you have any suggestion / improvement, please tell me :) I think making such API would be nice. Later, I would like to add the docs and use anyhow (better error handling)

Ebedthan commented 8 months ago

Great work @currocam!

Yes, I agree that the QMatrix will stay as an internal data structure unless we have a good reason. The work you did is already great for a start and we can start with it. I will work on improving the error handling. I would like to go for something like thiserrror which is great for lib error management If you are okay with it.

Congrats on the work already done! I am eager to use it in my future work!

currocam commented 8 months ago

Great! If you are more familiar with this thiserror, then we should definitely go for it.

currocam commented 8 months ago

Hi!

I have created the library and published it. I am afraid I don't have more time to spend in this project right now, but, please, feel free to contribute with the error handling. That would be great. Or any other idea you have about best practices.

You can find the docs here (I made what I hope is a nice API) and the crate page here

I will leave this issue opened until I hear from you :)

Ebedthan commented 7 months ago

Yes, thanks. I have seen the lib page. I will do a PR in the coming days on the error handling. (By the way, you can close this issue).

Best.