fteufel / SecretoGen

A conditional generative model for signal peptide design and efficiency prediction
https://openreview.net/forum?id=vXXEfmYsvS
BSD 3-Clause "New" or "Revised" License
4 stars 2 forks source link

Number of occurrences of species in the training set #2

Closed tony9664 closed 1 month ago

tony9664 commented 2 months ago

Hi, it is possible to know the number of occurrences of the species in the training set. It should be able to tell which organism is better represented during training, therefore would give more trustworthy predictions.

fteufel commented 2 months ago

Hi, great question, here are the counts for each species: taxonomy_counts.csv

Overall, as SecretoGen shares information across the phylogenetic tree when training, I would also take coverage of closely related organisms into account when judging applicability.

tony9664 commented 2 months ago

Thank you! That is very helpful. May I ask another question: Are you working on evaluating the newly generated sequences experimentally? Are they working well?

fteufel commented 2 months ago

No, we are not working on experiments for now - only evaluated the library ranking approach.