frankligy / SNAF

Splicing Neo Antigen Finder (SNAF) is an easy-to-use Python package to identify splicing-derived tumor neoantigens from RNA sequencing data, it further leverages both deep learning and hierarchical Bayesian models to prioritize certain candidates for experimental validation
MIT License
40 stars 8 forks source link

How to generate the gtex_ctrl_db and tcga_matched_control_junction_count.h5ad, it is a good control. #41

Open renyuan001 opened 4 months ago

renyuan001 commented 4 months ago

I think that the control is the best. How to generate the gtex_ctrl_db and tcga_matched_control_junction_count.h5ad, Maybe a number of *.fasq in tcga and GTEx analysed by SNAF first?

frankligy commented 4 months ago

Yes we ran AltAnalyze on GTEx and TCGA matched control first to get the count junction matrix, then we convert them into h5ad file. I saved scripts I used for the conversion (https://github.com/frankligy/SNAF/tree/main/images/db_build).

For generating your own control dataset, you can follow other issues post (https://github.com/frankligy/SNAF/issues/34).

Thank you, Frank

renyuan001 commented 4 months ago

I want to use the both in the snaf.initialize step, and an error occured,

db_dir = '/home/ry-03/data/SNAF/data' netMHCpan_path = '/home/ry-03/data/SNAF/netMHCpan-4.1/netMHCpan' tcga_ctrl_db = ad.read_h5ad(os.path.join(db_dir,'controls','tcga_matched_control_junction_count.h5ad')) gtex_ctrl_db = ad.read_h5ad(os.path.join(db_dir,'controls','GTEx_junction_counts.h5ad')) add_control = {'tcga_control':tcga_ctrl_db,'gtex_ctrl':gtex_ctrl_db} snaf.initialize(df=df,db_dir=db_dir,binding_method='netMHCpan',software_path=netMHCpan_path,add_control=add_control) 2024-05-12 19:15:40 starting initialization Current loaded gtex cohort with shape (56692, 2629) Adding cohort tcga_control with shape (54813, 705) to the database now the shape of control db is (56999, 3334) Traceback (most recent call last): File "", line 1, in File "/home/ry-03/miniconda3/envs/SNAF/lib/python3.7/site-packages/snaf/init.py", line 52, in initialize adata = gtex_configuration(df,gtex_db,t_min,n_max,normal_cutoff, tumor_cutoff, normal_prevalance_cutoff, tumor_prevalance_cutoff, add_control) File "/home/ry-03/miniconda3/envs/SNAF/lib/python3.7/site-packages/snaf/gtex.py", line 65, in gtex_configuration assert len(set(control.var_names).intersection(tissue_dict.keys())) == 0 AssertionError

Maybe I need to filtered one by one, not together?

renyuan001 commented 4 months ago

But it worked as:

add_control = {'tcga_control':tcga_ctrl_db} snaf.initialize(df=df,db_dir=db_dir,binding_method='netMHCpan',software_path=netMHCpan_path,add_control=add_control) 2024-05-12 20:09:18 starting initialization Current loaded gtex cohort with shape (56692, 2629) Adding cohort tcga_control with shape (54813, 705) to the database now the shape of control db is (56999, 3334) 2024-05-12 20:10:10 finishing initialization

frankligy commented 4 months ago

Yes, the reason is the same as this (https://github.com/frankligy/SNAF/issues/40), GTEx control is built in so you don't need to additionally add it, only TCGA control needs to be added as add_control.

renyuan001 commented 4 months ago

Thank you so much for your patient and thoughtful answers.