DearCaat / RRT-MIL

[CVPR 2024] Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology
85 stars 8 forks source link

question #13

Closed Yuan1z0825 closed 4 months ago

Yuan1z0825 commented 4 months ago

I want to ask a simple question, in the survival analysis code, carried out the cross-validation. And the final evaluation results were evaluated using the average of the highest metrics for each validation set? I don't think that makes any sense? Should separate sets of external tests be partitioned.

DearCaat commented 4 months ago

As far as I understand, the practice of not having a separate validation set is quite common in survival analysis.

I personally believe that, regardless of correctness, survival analysis calculates various metrics based on cases as the unit, resulting in a very limited number of cases. Typically, each dataset contains only 300-500 cases. Therefore, if we adopt a 2:8 split for the validation and training sets, the validation set usually contains fewer than 100 cases. A validation set with such a small number can easily mislead the training process and has limited statistical significance. This approach in the current work just follows the previous studies.

However, I fully agree that a separate validation set is crucial for experimental fairness and rigor when working with larger datasets.

Yuan1z0825 commented 4 months ago

Yes, many articles including Subjournals do the same. But I think the point is not to use the largest validation set, C-index, as a metric. You can specify that the epoch is more strictly generalized.