How to pre-process the test data

zhou07 commented 11 months ago

In the training data, we can determine the tumor's center using its label, and then crop the images based on this central point. However, the test data do not have labels. How can I pre-process the test data in this case?

MungoMeng commented 11 months ago

Hi, the test data in the HECKTOR challenge was not used in the XSurv paper. Actually, you also cannot use this test data for your own research due to the lack of ground-truth labels.

The test data can only be used during the challenge. When we attended the HECKTOR 2022 challenge, our strategy is to run the trained segmentation model on the testing images using slide-window approach, so that we can get the approximate location of tumors. Then, we followed the same processing procedure as the training data -- find the center and crop.....

zhou07 commented 10 months ago

Hi, I have a question regarding the different image sizes used in training and testing. During training, data augmentation was applied in real-time, which included random affine transformations and random cropping to a size of 112 × 112 × 112 voxels. However, in testing, the images are sized at 160 x 160 x 160 voxels. Could the difference in image sizes between training and testing impact the effectiveness of the model? Why not standardize the image sizes for both training and testing?

MungoMeng commented 10 months ago

The XSurv comprises convolution, Swin-trans, and global average pooling, and all these operations are size-invariant. The model can be trained and inferred with different image sizes, using the same learned weights. The size 160 x 160 x 160 was chosen to include most of the lesion region, while the size 112 × 112 × 112 is given under the constraints of GPU memory during training. You can also try random cropping to, maybe, 128 × 128 × 128 if GPU allows, and I believe the results should be similar. For data augmentation, random cropping images to a smaller size can reduce the risk of overfitting.

zhou07 commented 10 months ago

Thank you for sharing your code.

When I split the dataset as you described in the article, using two centers (CHUM and CHUV) as the testing set, the results were similar to yours. However, when I randomly split the dataset into a training set (386 samples) and a testing set (102 samples), or use a different center, such as CHUV and HGJ, to create a testing set of around 100 samples, the results seem to decrease significantly. Does the method of splitting the dataset greatly influence the results?

Regarding the use of 5-fold cross-validation within the training set, I normally select the best result from the validation set to test the model on the testing set. However, I've noticed an inconsistency between the results of the validation set and the testing set. Specifically, a higher C-index in the validation set corresponds to a lower C-index in the testing set. Have you encountered this situation? How do you address this issue?

MungoMeng commented 10 months ago

Hi, I am happy that you successfully ran our code!

I also met the problem you mentioned. I think this is reasonable because the distribution/difficulty of each center is different. Different data split naturally results in different results. Therefore, the absolute results shown in our paper are not that important, while the improvements over other methods are consistent for different data split -- I think this can validate the effectiveness of our method.

This problem can be naturally addressed by increasing the data size to "big data". However, for the current limited data, our strategy used in the HECTOR challenge is: 5-fold cross-validation on all training data and then derive five best models that can achieve the best validation result on each validation set. Finally, average the results of the five model on the testing set.

zhou07 commented 10 months ago

Thank you for sharing your experience with your experiment. I have some questions regarding Radiomics Enhancement. You mentioned that a total of 1689 radiomics features were extracted. I am using the Pyradiomics package and have applied wavelet decompositions including "'Haar', 'Db2', 'Sym2', 'Coif1', 'Bior1.3', 'Rbior1.3', 'Meyer', 'Gabor'" to PET/CT images, but I only managed to extract 129 radiomics features from PET or CT images. Furthermore, after applying LASSO selection, I ended up with 0 features. Could you please share your code for the extraction and selection of radiomics features?

Additionally, when I use the CoxPH model to integrate the selected radiomics features with clinical indicators (e.g., age, gender), I encounter a ConvergenceError. This error indicates that convergence halted due to matrix inversion problems. The code warns of high collinearity, likely because the clinical indicators have very low variance. Have you encountered similar problems, and if so, how did you resolve them?

MungoMeng commented 10 months ago

Hi, I think you are using the Pyradiomics wrongly. Please carefully read the documentation: https://pyradiomics.readthedocs.io/en/latest/customization.html. For Image Types, choose Original and Wavelet; for enabled features, choose all available classes.

Before entering the CoxPH, the radiomics and clinical features should be first selected. Use LASSO for radiomics features and perform univariate/multivariate Cox analyses for clinical indicators (refer to Section 3.1).

zhou07 commented 10 months ago

Hi,

Thank you for your previous advice. I've made some modifications to my Pyradiomics code based on your suggestions, as shown below:

settings = {}
settings['imageType'] = {
        'Original': {},
        'Wavelet': {}
}
extractor = featureextractor.RadiomicsFeatureExtractor(**settings)
extractor.enableAllFeatures()

However, I'm still extracting only 129 features. Could you suggest any potential reasons for this discrepancy?

Regarding the clinical information, could you share how you handle such missing values, NaN data, in your analyses? I still encountered a ConvergenceError during univariate/multivariate Cox analyses of clinical indicators.

MungoMeng commented 10 months ago

Hi, this is my settings for 'imageType':

The NaN weight is set as 75kg as suggested by challenge organizers. For other categorical indicators (e.g., HPV, alcohol, etc), please use 'pd.get_dummies' for encoding, where the NaN values will be encoded into all-zero vectors.

MungoMeng / Survival-XSurv

How to pre-process the test data #3