All lineage models should now work from a single sparse matrix with kNN nearest neighbours (N.B. I have not checked whether nearest neighbours of the same distance are selected randomly, or if this is affected by input name order)
The kNN of this matrix is determined by the max_search_depth option
The matrix used for clustering comes from reducing this matrix by counting neighbours/counting unique distances/reciprocal BLAST - always regenerated from main matrix, which is the only aspect updated with querying
Added a script that generates consistent lineage databases for all strains in a non-lineage database - would be good to use this as an example workflow for beebop, views of @johnlees and @muppi1993 on how to store script/information needed for relating this databases appreciated!
max_search_depth
optioncugraph
- see https://github.com/rapidsai/raft/issues/740, https://github.com/rapidsai/rmm/pull/931 - hopefully fixed inrapids=22.12
(based on https://github.com/rapidsai/raft/commit/2325d2b4cad2faf0ef1bce976cb377eb25b4d81d), but22.10
is the latest version available on conda (https://anaconda.org/rapidsai/rapids - 16/10/22)Validation on serotype 3 dataset: