Closed tony9664 closed 1 month ago
Hi, great question, here are the counts for each species: taxonomy_counts.csv
Overall, as SecretoGen shares information across the phylogenetic tree when training, I would also take coverage of closely related organisms into account when judging applicability.
Thank you! That is very helpful. May I ask another question: Are you working on evaluating the newly generated sequences experimentally? Are they working well?
No, we are not working on experiments for now - only evaluated the library ranking approach.
Hi, it is possible to know the number of occurrences of the species in the training set. It should be able to tell which organism is better represented during training, therefore would give more trustworthy predictions.