frankligy / SNAF

Splicing Neo Antigen Finder (SNAF) is an easy-to-use Python package to identify splicing-derived tumor neoantigens from RNA sequencing data, it further leverages both deep learning and hierarchical Bayesian models to prioritize certain candidates for experimental validation
MIT License
40 stars 8 forks source link

annotate the T antigen #1

Closed frankligy closed 2 years ago

frankligy commented 2 years ago

Although I use 3-way in-silico translation, a lot of time, proteins prefer to use one ORF, so we want to annotate each obtained T antigen to prioritize certain neoantigens for experimental validation.

frankligy commented 2 years ago
gtf = pd.read_csv('/Users/ligk2e/Desktop/gtfEnsembl91.txt',sep='\t')
gtf_sc = gtf.loc[gtf['feature']=='start_codon',:]
gtf_sc['gene'] = [item[0].split(' ')[1].strip('"') for item in gtf_sc['attribute'].str.split(';')]
dic = {}
for gene, sub in gtf_sc.groupby(by='gene'):
    dic[gene] = list(sub['start'].unique())
sc = pd.Series(data=dic,name='start_codon').to_frame()
col = []
for lis in sc['start_codon']:
    if len(lis) == 1:
        col.append(lis)
    else:
        remainder = [item%3 for item in lis]
        col.append(pd.Series(index=lis,data=remainder).drop_duplicates().index.tolist())
sc['non_redundant'] = col
sc.to_csv('/Users/ligk2e/Desktop/df_start_codon.txt',sep='\t')