BioTechCo / main_project

Cancer prediction using DNA methylation
MIT License
1 stars 2 forks source link

Add basic_dbeta_no_champ #49

Closed Vghxv closed 3 months ago

Vghxv commented 3 months ago

Development

Description

Part of #39 Should be merged after directories are ignored #48

Implementation

Detail implementations are documented at the top of the notebook

Required Hidden Data

Checked ones are the data required

./breast
├── champ_result
│   ├── GDC_breast_tissue
│   │   ├── DMP_result_0.csv
│   │   ├── DMP_result_1.csv
│   │   ├── all_beta_normalized_0.csv
│   │   └── all_beta_normalized_1.csv
│   ├── GSE148663
│   │   ├── DMP_result.csv
│   │   └── all_beta_normalized.csv
│   ├── GSE237036
│   │   ├── DMP_result.csv
│   │   └── all_beta_normalized.csv
│   ├── GSE243529
│   │   ├── DMP_result_0.csv
│   │   ├── DMP_result_1.csv
│   │   ├── all_beta_normalized_0.csv
│   │   └── all_beta_normalized_1.csv
│   └── GSE89093_nc
│       ├── all_beta_normalized.csv✅
│       └── phenotype.csv✅
├── data
│   ├── breast_0
│   │   └── sample_sheet.csv
│   └── breast_1
│       └── sample_sheet.csv
└── result
    └── GSE89093_nc
        └── train100

Checklist

xzh0623 commented 3 months ago

train_normal_avg = all_beta_normalized_normal.mean(skipna=True, axis=0)

計算dbeta的部分 這裡的axis=1,shape才會是(453627,)

[ 建議 ] 在有axis的地方之後多加print shape出來 比較容易知道整個過程在做什麼

Vghxv commented 3 months ago

之後在跑的時候不應該還在 debug

Vghxv commented 3 months ago

計算dbeta的部分 這裡的axis=1,shape才會是(453627,)

我在更上面的地方寫錯了

Vghxv commented 3 months ago

我要 push 計算結果嗎

xzh0623 commented 3 months ago

我要 push 計算結果嗎

可以

xzh0623 commented 3 months ago
all_beta_normalized_normal = all_beta_normalized_t.iloc[
    :, np.nonzero(mask.T)[1]
].T.reset_index(drop=True)
all_beta_normalized_tumor = all_beta_normalized_t.iloc[
    :, np.nonzero(~mask.T)[1]
].T.reset_index(drop=True)

在這個部分shape是(46, 453627),後續在remove_outlier的時候會變成計算樣本的IQR

Vghxv commented 3 months ago

是,原本算的是錯的

Vghxv commented 3 months ago

image

套 outlier 的時候,每個 row 是samples On branch basic