Preprocessing method - Githubissues

rongyua commented 8 months ago

Hello, thank you very much for your work. I would like to know how to obtain the 5 files in the cloud disk. Can you provide the preprocessing method and how to obtain those files? Thank you very much.

cuicathy commented 8 months ago

Hello, Thanks for your interest in our work! To generate the unimodal embedding used for pathology, genomics and demographics embeddings, we follow the supervised learning methods from the Pathomic Fusion https://github.com/mahmoodlab/PathomicFusion, and the small adjustments are introduced in our paper. Also, the unimodal networks have been provided the network.py in our repo. As for the radiomics features of radiology images, in addition to the supervised embedding, we add radiomics features from pyRadiomics. More details about the radiomics features can be seen in our supplementary materials. Here is the link: (https://static-content.springer.com/esm/chp%3A10.1007%2F978-3-031-16443-9_60/MediaObjects/539247_1_En_60_MOESM1_ESM.pdf).

cuicathy commented 8 months ago

The key component about Radiomics feature extraction (Radiomics.py and Radiomics_params.yaml) has been uploaded to the repo. Also, the radiology images with masks (generated by a nnUNet pretrained by the BraTs Training set) have been uploaded here: https://drive.google.com/drive/folders/1yZGIXrcB4Fb1XdYgEoyueUXwnBWJzhLU?usp=sharing.

rongyua commented 8 months ago

Thanks again for your prompt reply, but I get the following error when running the code。

rad_demo

0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last): File "main.py", line 78, in fold_range=fold_range) File "/media/ail/Expansion/Liu/MMD_SurvivalPrediction-main/code/evaluation.py", line 302, in evaluate_missingModa model in tqdm(model3)] File "/media/ail/Expansion/Liu/MMD_SurvivalPrediction-main/code/evaluation.py", line 302, in model in tqdm(model3)] File "/media/ail/Expansion/Liu/MMD_SurvivalPrediction-main/code/evaluation.py", line 13, in getPValAggSurv_GBMLGG_Binary_missingView single_view=single_view, fold_range=fold_range) File "/media/ail/Expansion/Liu/MMD_SurvivalPrediction-main/code/evaluation.py", line 119, in getDataAggSurv_GBMLGG_missingView data_cv = pickle.load(open(data_cv_path, 'rb')) FileNotFoundError: [Errno 2] No such file or directory: '../data/gbmlgg15cv_patches_embedding3.pkl'

Do you know what is going on? And can you provide the directory structure of the entire project? I would be grateful if you could.

cuicathy commented 8 months ago

Hi, could you try adjusting the required path to the file "gbmlgg15cv_patches_embedding.pkl" you downloaded from Google Drive? You should be able to see the data structure by reading the pkl file.

On Thu, Mar 14, 2024 at 1:01 AM rongyua @.***> wrote:

Thanks again for your prompt reply, but I get the following error when running the code。 rad_demo

0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last): File "main.py", line 78, in fold_range=fold_range) File "/media/ail/Expansion/Liu/MMD_SurvivalPrediction-main/code/evaluation.py", line 302, in evaluate_missingModa model in tqdm(model3)] File "/media/ail/Expansion/Liu/MMD_SurvivalPrediction-main/code/evaluation.py", line 302, in model in tqdm(model3)] File "/media/ail/Expansion/Liu/MMD_SurvivalPrediction-main/code/evaluation.py", line 13, in getPValAggSurv_GBMLGG_Binary_missingView single_view=single_view, fold_range=fold_range) File "/media/ail/Expansion/Liu/MMD_SurvivalPrediction-main/code/evaluation.py", line 119, in getDataAggSurv_GBMLGG_missingView data_cv = pickle.load(open(data_cv_path, 'rb')) FileNotFoundError: [Errno 2] No such file or directory: '../data/gbmlgg15cv_patches_embedding3.pkl'

Do you know what is going on? And can you provide the directory structure of the entire project? I would be grateful if you could.

— Reply to this email directly, view it on GitHub https://github.com/cuicathy/MMD_SurvivalPrediction/issues/2#issuecomment-1996586113, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJV2UKHCDVEV4JRVG2I622TYYE4JRAVCNFSM6AAAAABET6BLY2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJWGU4DMMJRGM . You are receiving this because you commented.Message ID: @.***>

rongyua commented 8 months ago

Okay, thank you again for your timely reply. One last little question. The optimal C index reported in the Pathomic Fusion paper is 0.826, but in your paper it is only 0.769. The paper does not explain in detail how to obtain this result. You Could you please guide me again? I would be very grateful if you could.

cuicathy commented 8 months ago

Sure. In our work, the c-index for method comparison between Pathomic Fusion and our method is based on the 170 patients with all 4 modalities (with radiology images added) available, which is the subset of the 769 patients used in Pathomic Fusion. For fair comparison, we recalculated the c-index of the 170 patients to get 0.769. Also, if you are interested in the performance of Pathomic Fusion in 4 modalities with missing modalities of the 962 patients. These are shown in Table 3 in our work.

rongyua commented 8 months ago

Could you please provide some of the baseline code in the paper? The official code does not run very well. I hope it will not cause you much trouble.

rongyua commented 7 months ago

Hello, can you tell me how to process the complete mode into an incomplete mode and how the mask_gbmlgg15cv.pkl file is generated?

cuicathy commented 7 months ago

Hi,

To test the data in the incomplete modes, please check the function test_missingModa() in train_test.py. In the released code, we provide two examples of path+gene missing and pathology missing only (check the keywords "rad_demo" and "path_missing" in the example code). You can write your code following the example.

The file mask_gbmlgg15cv.pkl indicates the availabilities of each modality for each subject in the raw data. The data structure is very simple, it can be easily generated based on the information in img_availability.csv. In our example, the mask availability of the training set can be either 0 (missing) or 1 (available) based on the availability of raw data (we want to use the whole training set), but the mask for testing is all 1 for the ablation study. You can try different missing modes for testing based on the above instructions.

I just noticed that I missed your previous message for the baselines. I apologize for the delayed reply. About the baseline of different fusion methods (tensor fusion, concatenation), hopefully I can find out and upload them in 2 weeks If you still need them.

On Mon, Apr 15, 2024 at 7:41 AM rongyua @.***> wrote:

Hello, can you tell me how to process the complete mode into an incomplete mode and how the mask_gbmlgg15cv.pkl file is generated?

— Reply to this email directly, view it on GitHub https://github.com/cuicathy/MMD_SurvivalPrediction/issues/2#issuecomment-2056761128, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJV2UKABWVKWZ26M6RHES4LY5PDITAVCNFSM6AAAAABET6BLY2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJWG43DCMJSHA . You are receiving this because you commented.Message ID: @.***>

rongyua commented 7 months ago

No, no, I have solved the baseline problem. I am currently using other cancer data sets, and only pathological images and gene expression data are used. I want to use your method to build my own missing modality data set. Can you Can you give me some advice? Thank you very much.

rongyua commented 7 months ago

Sorry to bother you again. Can you please provide me with the baseline of Deep Orthogonal fusion? It seems that I can’t reproduce its experiment with my ability. If it’s more troublesome, forget it. I’m sorry to bother you again.

cuicathy commented 7 months ago

Hello,

You can consider using similar dataset structures. Use a "mask" structure to indicate the availability of each modality and a "data" structure to store the raw information for each modality. The released code supports four modalities directly. You can modify the code to handle two modalities on your own, or simply set the masks for pathology and gene to 1, and set the masks for non-existent modalities to 0. This ensures that the weights for the unused modalities will not be updated.

For the deep orthogonal fusion, you may want to contact the author directly to get their official code. Otherwise, the orthogonal loss mentioned in their paper released their code (https://github.com/jlezama/OrthogonalLowrankEmbedding). However, their code is not very up-to-date, which may not fit well with your environment. So, I recommend another more recent method (https://github.com/kahnchana/opl?tab=readme-ov-file), which outperformed the previous one in their report. In their code, loss_op = op_loss(features, targets), where targets are the categories of the objects and features are the embedding of the objects. You can stack the embedding of different modalities for each patient and assign the corresponding modality categories as the targets.

cuicathy / MMD_SurvivalPrediction

Preprocessing method #2

rad_demo