Closed NetoPedro closed 1 month ago
Hello,
Thank you for your comment. Since the model has a radically reduced parameter count, it would likely not be possible to obtain Hover-Net level results accross tissue types, stains and scanners because of site specific variability in all of those variables. In this case, KD allows the HoverFast model to perform at about Hover-Net level accuracy for a user's particular stain/scanner type combination while doing so in a fraction of the time and computational cost.
It essentially learns to reproduce Hover-Net output via KD for the part of the dataset that is relevant to the user's specific site, rather than trying to generalize across all possible tissue types, stains, and scanners. Training with the original labels alone, without the distillation process, would likely result in a model that is less accurate for site-specific tasks, as it wouldn't benefit from the distilled knowledge of a more complex model like Hover-Net. The KD process is key to transferring the generalization ability of the larger model while maintaining computational efficiency and a reduced parameter count.
This is why we didn’t train on the entire dataset; the model’s reduced capacity is optimized for site-specific applications and would struggle to generalize across a broader range of samples.
Thanks again for your insightful question, and I hope this clarifies the role of KD in our approach.
I got it now. It is task/dataset/tissue specific. Makes sense. Good choice
Hi,
I am reviewing the checklist https://github.com/openjournals/joss-reviews/issues/7022#issuecomment-2245007276.
Overall the paper looks good and the improvements are outstanding with regards to the computational cost. I have single question (or two but in the same direction). What is the impact of KD here? How different would be the results if HoverFast was trained just with the original labels?
Thanks.