Magdoll / cDNA_Cupcake

Miscellaneous collection of Python and R scripts for processing Iso-Seq data
BSD 3-Clause Clear License
257 stars 104 forks source link

How to configure the correct /optimal parameter and validate the true positive from PB fusion candidate list. #226

Open wzhang42 opened 2 years ago

wzhang42 commented 2 years ago

Hi, Magdoll, I have download the fusion fasta file and used the same commands and parameters at (https://github.com/Magdoll/cDNA_Cupcake/wiki/Best-practice-for-fusion-transcript-finding) 1) minimap2 against hg38 & sort 2.) fusion_finder.py , 3.) SQANTI (gencode.v33) , 4.) fusion_collate_info.py. However, in the final annotation result files (output.fusion.annotated.txt + output.fusion.annotated_ignored.txt, or the output.fusion.gff), I could not find the mentioned PBfusion.142 with the break point at the close position at chr1:6825211. (Please you correct me if I used the wrong analyzing procedure) I also adopted the close procedure and run several rounds to my own data, and generated several PBfusion candidates. However, we do not know whether they are just false positive (many of them are mapped to intron regions). I am also puzzled to configure the optimal parameters for fusion_finder.py ( -c as 0.05 for per locus coverage , -t as 0.98 for total coverage, -d as 10K for distance) . In short, I am not clear on how to configure the correct /optimal parameter and also do not know the procedure to validate/figure out the true positive from PB fusion candidate list. Sorry for so many questions. Many thanks in advance. Wenchao