Closed Nianzhen-GU closed 2 years ago
Steps 1-3 above each have their own parameters that can be tuned. For example, setting a significant effect size to be 0.7 vs 0.5. The same goes for machine learning probabilities and network edge weights. So, to answer your question, instead of attempting to optimize these parameters to fit any dataset, which is generally not possible, vRhyme bins over many iterations to find the best fit. Each iteration is a collection of different cutoff values for the 3 steps above. What happens is the parameters are optimized for your dataset by selecting which iteration performs best. Since binning is not a supervised approach (we don't know the true answer) vRhyme has a scoring method to rank iterations.
Thank you very much for your explanation!
Hi,
I encountered one question when reading the vRhyme paper. In the part Score processing, 'Each binning iteration is given a score I according to protein redundancy, total bins, and the number of scaffolds binned'. I want to know what the binning iteration is? Is it corresponding to the previously mentioned grid search method?
Thank you!