Open renyuan001 opened 6 months ago
Yes we ran AltAnalyze on GTEx and TCGA matched control first to get the count junction matrix, then we convert them into h5ad file. I saved scripts I used for the conversion (https://github.com/frankligy/SNAF/tree/main/images/db_build).
For generating your own control dataset, you can follow other issues post (https://github.com/frankligy/SNAF/issues/34).
Thank you, Frank
I want to use the both in the snaf.initialize step, and an error occured,
db_dir = '/home/ry-03/data/SNAF/data' netMHCpan_path = '/home/ry-03/data/SNAF/netMHCpan-4.1/netMHCpan' tcga_ctrl_db = ad.read_h5ad(os.path.join(db_dir,'controls','tcga_matched_control_junction_count.h5ad')) gtex_ctrl_db = ad.read_h5ad(os.path.join(db_dir,'controls','GTEx_junction_counts.h5ad')) add_control = {'tcga_control':tcga_ctrl_db,'gtex_ctrl':gtex_ctrl_db} snaf.initialize(df=df,db_dir=db_dir,binding_method='netMHCpan',software_path=netMHCpan_path,add_control=add_control) 2024-05-12 19:15:40 starting initialization Current loaded gtex cohort with shape (56692, 2629) Adding cohort tcga_control with shape (54813, 705) to the database now the shape of control db is (56999, 3334) Traceback (most recent call last): File "
", line 1, in File "/home/ry-03/miniconda3/envs/SNAF/lib/python3.7/site-packages/snaf/init.py", line 52, in initialize adata = gtex_configuration(df,gtex_db,t_min,n_max,normal_cutoff, tumor_cutoff, normal_prevalance_cutoff, tumor_prevalance_cutoff, add_control) File "/home/ry-03/miniconda3/envs/SNAF/lib/python3.7/site-packages/snaf/gtex.py", line 65, in gtex_configuration assert len(set(control.var_names).intersection(tissue_dict.keys())) == 0 AssertionError
Maybe I need to filtered one by one, not together?
But it worked as:
add_control = {'tcga_control':tcga_ctrl_db} snaf.initialize(df=df,db_dir=db_dir,binding_method='netMHCpan',software_path=netMHCpan_path,add_control=add_control) 2024-05-12 20:09:18 starting initialization Current loaded gtex cohort with shape (56692, 2629) Adding cohort tcga_control with shape (54813, 705) to the database now the shape of control db is (56999, 3334) 2024-05-12 20:10:10 finishing initialization
Yes, the reason is the same as this (https://github.com/frankligy/SNAF/issues/40), GTEx control is built in so you don't need to additionally add it, only TCGA control needs to be added as add_control
.
Thank you so much for your patient and thoughtful answers.
I think that the control is the best. How to generate the gtex_ctrl_db and tcga_matched_control_junction_count.h5ad, Maybe a number of *.fasq in tcga and GTEx analysed by SNAF first?