MMostavi / CNNCancerType

This is the repository for paper titled as "Convolutional neural network models for cancer type prediction based on gene expression".
11 stars 3 forks source link

Preprocessing #2

Open lydiahjchung opened 3 years ago

lydiahjchung commented 3 years ago

@MMostavi , would it be possible to disclose the preprocessing code making the input data files?

MMostavi commented 3 years ago

The preprocessing code requires having the original dataset at the date that we downloaded it. Since the TCGA is still in progress work, and the volume is big, I would not be able to provide the code for doing this part. However, there are only few steps that we have done to reach the dataset which is available:

  1. Load the original data into panda dataframe in python
  2. Find the cancer and tissue samples based on the ID info given by TCGA barcode
    https://docs.gdc.cancer.gov/Encyclopedia/pages/TCGA_Barcode/#:~:text=TCGA%20barcodes%20were%20used%20to,metadata%20values%20for%20a%20sample.
  3. Filter genes across all cancer types with mean and std given in the paper.

Hopefully this answer helps you to perform the preprocessing. Good luck

syan1 commented 3 years ago

@MMostavi Hi, I am attempting to understand the preprocessing statement in your paper.

To test the robustness of our models, we added Gaussian noises with zero mean and standard deviations of 0–500% (k) of ith gene's average expression level (μi), or N(0, kμ) to each gene. We set noisy gene expression level to 0 if noise added expression level is less than 0.

Can you walk me through this step? I assume it is after filtering out mean < 0.5 and std < 0.8 in the previous step?

Thank you!

linameziane commented 2 months ago

@MMostavi Hi, I am attempting to understand the preprocessing statement in your paper.

To test the robustness of our models, we added Gaussian noises with zero mean and standard deviations of 0–500% (k) of ith gene's average expression level (μi), or N(0, kμ) to each gene. We set noisy gene expression level to 0 if noise added expression level is less than 0.

Can you walk me through this step? I assume it is after filtering out mean < 0.5 and std < 0.8 in the previous step?

Thank you!

Hello, if you understand it can you help me . Thank you!