Open frankligy opened 10 months ago
I have released a new version implementing this function, please pip install like below:
pip install git+https://github.com/frankligy/SNAF.git@4f7d76321c32625c1909ad059b81d646a0cd9ef5
Now right after your T antigen workflow, assuming the outdir
is set to the result
, then you can use the find_full_length
mode to generate all possible full length isoform associated with each NeoJunction:
# initiate B pipeline
from snaf import surface
surface.initialize(db_dir=db_dir)
# get fake membrane tuples, not membrane in this case but all NeoJunctions
membrane_tuples = snaf.JunctionCountMatrixQuery.get_fake_membrane_tuples(df,add_control=add_control,outdir='result/surface_fake')
# run the B pipeline using find_full_length mode
surface.run(uids=membrane_tuples,outdir='result/surface_fake',prediction_mode='find_full_length',
gtf=None,
tmhmm=False,software_path=None)
# generate result using find_full_length mode
surface.generate_full_results(outdir='result/surface_fake',mode='find_full_length',
freq_path='result/frequency_stage0_verbosity1_uid_gene_symbol_coord_mean_mle.txt',
validation_gtf=os.path.join(db_dir,'2021UHRRIsoSeq_SQANTI3_filtered.gtf'))
Looking for a file named sr_ffl_str3_report_None_False.txt
, it looks like following,
The mRNA_sequence
can be readiliy validated using BLAT tool on UCSC genome browser, using the first one for example:
Plus, If you are looking for sr_ffl_str5_report_None_False.txt
file, these are the ones with long-read validation based on 10 cancer cell lines long-read data.
Please share your feedback for this new function, once being tested by users, I'll make it official in the tutorial.
Thank you, Frank
Right now, SNAF-B pipeline only looks for membrane protein. But people may be interested in know the potential full-length isoform for all NeoJunctions, please implement these features.