Oshlack / JAFFA

JAFFA is a multi-step pipeline that takes either raw RNA-Seq reads, or pre-assembled transcripts, then searches for gene fusions
https://github.com/Oshlack/JAFFA/wiki
Other
86 stars 21 forks source link

BCL2 fusion not catched? #95

Closed asaki1986 closed 4 months ago

asaki1986 commented 1 year ago

Hi all,

I am using JAFFA-2.4 to discover the BCL2 fusions in FFPE target RNA sequencing data.

While this sample had been verified positive using FISH, with partener gene IGHJ, which had been verified based on DNA sequencing data.

However, we did not find BCL2-IGHJ fusion in the jaffa_results.csv.

I carefully examined the raw files in the analysis folder.

And I indeed found lots of evidence in SAMPLE.txt file as follows,

E150003903L1C001R0023300833/1/1 98 105 BCL2:IGHJ6 150 hg19_wgEncodeGencodeBasicV19_ENST00000398117.1__range=chr18:60790579-60987361__5'pad=0__3'pad=0__strand=-__repeatMasking=none 4513 hg19_wgEncodeGencodeBasicV19_ENST00000390560.2__range=chr14:106329408-106329468__5'pad=0__3'pad=0__strand=-__repeatMasking=none 16 99 104 plus

image

There are 920 lines supporting this kind of fusion, however it seems these reads were not considered in the final results. Can you please explain the detailed column names for this file?

And I am wondering whether these lines are preliminary results, and what parameters/cutoff were set to filter these results?

Thanks,

Junfeng

nadiadavidson commented 1 year ago

Hi Junfeng,

It certainly looks like the fusion is initially picked up and then gets filtered out later on for some reason. Here is a thread which describes the column in the file you posted (https://github.com/Oshlack/JAFFA/issues/71).

Fusions involving IGH can be quit challenging to identify and I wonder here if the IGHJ6 part of the fusion aligns in an unexpected way to the reference genome. The file you posted is an intermediate file, meaning that if you see a fusion you expect like this you know it's there, however you can not use it to "discover" new fusions as the false positive rate it too high. JAFFA has a number of steps after this, like aligning the reads to the reference genome and checking their position, so letting these events pass through is unlikely to be as simple as changing some parameters/cutoffs. I am happy to investigate more if you are willing to send me 3-4 reads from the fusion gene.

I also note in your results that you reads are labelled like "/1/1" or "/2/2" and this looks like it may be a bug in JAFFA (you should only have "/1" or "/2" however, this shouldn't change whether the fusion is identified only it's rank (from high confidence down to medium confidence or trans-splicing).

Cheers, Nadia.

asaki1986 commented 1 year ago

Hi Junfeng,

It certainly looks like the fusion is initially picked up and then gets filtered out later on for some reason. Here is a thread which describes the column in the file you posted (#71).

Fusions involving IGH can be quit challenging to identify and I wonder here if the IGHJ6 part of the fusion aligns in an unexpected way to the reference genome. The file you posted is an intermediate file, meaning that if you see a fusion you expect like this you know it's there, however you can not use it to "discover" new fusions as the false positive rate it too high. JAFFA has a number of steps after this, like aligning the reads to the reference genome and checking their position, so letting these events pass through is unlikely to be as simple as changing some parameters/cutoffs. I am happy to investigate more if you are willing to send me 3-4 reads from the fusion gene.

I also note in your results that you reads are labelled like "/1/1" or "/2/2" and this looks like it may be a bug in JAFFA (you should only have "/1" or "/2" however, this shouldn't change whether the fusion is identified only it's rank (from high confidence down to medium confidence or trans-splicing).

Cheers, Nadia.

Hi Nadia,

Thanks for reply. I extract several sequences information from results, list as follows.

E150003903L1C001R0023300833/1/1 98 105 BCL2:IGHJ6 150 hg19_wgEncodeGencodeBasicV19_ENST00000398117.1__range=chr18:60790579-60987361__5'pad=0__3'pad=0__strand=-__repeatMasking=none 4513 hg19_wgEncodeGencodeBasicV19_ENST00000390560.2__range=chr14:106329408-106329468__5'pad=0__3'pad=0__strand=-__repeatMasking=none 16 99 104 plus E150003903L1C001R0051737469/1/1 75 82 BCL2:IGHJ6 150 hg19_wgEncodeGencodeBasicV19_ENST00000398117.1__range=chr18:60790579-60987361__5'pad=0__3'pad=0__strand=-__repeatMasking=none 4444 hg19_wgEncodeGencodeBasicV19_ENST00000390560.2__range=chr14:106329408-106329468__5'pad=0__3'pad=0__strand=-__repeatMasking=none 61 76 81 minus E150003903L1C001R0072448096/1/1 27 34 BCL2:IGHJ6 150 hg19_wgEncodeGencodeBasicV19_ENST00000398117.1__range=chr18:60790579-60987361__5'pad=0__3'pad=0__strand=-__repeatMasking=none 4396 hg19_wgEncodeGencodeBasicV19_ENST00000390560.2__range=chr14:106329408-106329468__5'pad=0__3'pad=0__strand=-__repeatMasking=none 43 28 33 minus E150003903L1C001R0093511907/2/2 39 46 BCL2:IGHJ6 150 hg19_wgEncodeGencodeBasicV19_ENST00000398117.1__range=chr18:60790579-60987361__5'pad=0__3'pad=0__strand=-__repeatMasking=none 4513 hg19_wgEncodeGencodeBasicV19_ENST00000390560.2__range=chr14:106329408-106329468__5'pad=0__3'pad=0__strand=-__repeatMasking=none 16 40 45 plus E150003903L1C001R0100377918/1/1 90 97 BCL2:IGHJ6 150 hg19_wgEncodeGencodeBasicV19_ENST00000398117.1__range=chr18:60790579-60987361__5'pad=0__3'pad=0__strand=-__repeatMasking=none 4459 hg19_wgEncodeGencodeBasicV19_ENST00000390560.2__range=chr14:106329408-106329468__5'pad=0__3'pad=0__strand=-__repeatMasking=none 61 91 96 minus E150003903L1C001R0100432518/1/1 79 86 BCL2:IGHJ6 150 hg19_wgEncodeGencodeBasicV19_ENST00000398117.1__range=chr18:60790579-60987361__5'pad=0__3'pad=0__strand=-__repeatMasking=none 4448 hg19_wgEncodeGencodeBasicV19_ENST00000390560.2__range=chr14:106329408-106329468__5'pad=0__3'pad=0__strand=-__repeatMasking=none 61 80 85 minus E150003903L1C001R0104297198/2/2 46 53 BCL2:IGHJ6 150 hg19_wgEncodeGencodeBasicV19_ENST00000398117.1__range=chr18:60790579-60987361__5'pad=0__3'pad=0__strand=-__repeatMasking=none 4513 hg19_wgEncodeGencodeBasicV19_ENST00000390560.2__range=chr14:106329408-106329468__5'pad=0__3'pad=0__strand=-__repeatMasking=none 16 47 52 plus E150003903L1C001R0111569028/1/1 92 99 BCL2:IGHJ6 150 hg19_wgEncodeGencodeBasicV19_ENST00000398117.1__range=chr18:60790579-60987361__5'pad=0__3'pad=0__strand=-__repeatMasking=none 4466 hg19_wgEncodeGencodeBasicV19_ENST00000390560.2__range=chr14:106329408-106329468__5'pad=0__3'pad=0__strand=-__repeatMasking=none 61 93 98 minus E150003903L1C001R0111771499/1/1 83 90 BCL2:IGHJ6 150 hg19_wgEncodeGencodeBasicV19_ENST00000398117.1__range=chr18:60790579-60987361__5'pad=0__3'pad=0__strand=-__repeatMasking=none 4513 hg19_wgEncodeGencodeBasicV19_ENST00000390560.2__range=chr14:106329408-106329468__5'pad=0__3'pad=0__strand=-__repeatMasking=none 16 84 89 plus E150003903L1C001R0112736145/1/1 49 56 BCL2:IGHJ6 150 hg19_wgEncodeGencodeBasicV19_ENST00000398117.1__range=chr18:60790579-60987361__5'pad=0__3'pad=0__strand=-__repeatMasking=none 4513 hg19_wgEncodeGencodeBasicV19_ENST00000390560.2__range=chr14:106329408-106329468__5'pad=0__3'pad=0__strand=-__repeatMasking=none 16 50 55 plus E150003903L1C001R0112736145/2/2 46 53 BCL2:IGHJ6 150 hg19_wgEncodeGencodeBasicV19_ENST00000398117.1__range=chr18:60790579-60987361__5'pad=0__3'pad=0__strand=-__repeatMasking=none 4464 hg19_wgEncodeGencodeBasicV19_ENST00000390560.2__range=chr14:106329408-106329468__5'pad=0__3'pad=0__strand=-__repeatMasking=none 61 47 52 minus E150003903L1C001R0113400740/1/1 38 45 BCL2:IGHJ6 150 hg19_wgEncodeGencodeBasicV19_ENST00000398117.1__range=chr18:60790579-60987361__5'pad=0__3'pad=0__strand=-__repeatMasking=none 4407 hg19_wgEncodeGencodeBasicV19_ENST00000390560.2__range=chr14:106329408-106329468__5'pad=0__3'pad=0__strand=-__repeatMasking=none 54 39 44 minus E150003903L1C001R0113875283/1/1 79 86 BCL2:IGHJ6 150 hg19_wgEncodeGencodeBasicV19_ENST00000398117.1__range=chr18:60790579-60987361__5'pad=0__3'pad=0__strand=-__repeatMasking=none 4513 hg19_wgEncodeGencodeBasicV19_ENST00000390560.2__range=chr14:106329408-106329468__5'pad=0__3'pad=0__strand=-__repeatMasking=none 16 80 85 plus

E150003903L1C001R0023300833/1/1 TTTGACCTTTAGAGAGTTGCTTTACGTGGCCTGTTTCAACACAGACCCACCCAGAGCCCTCCTGCCCTCCTTCCGCGGGGGCTTTCTCATGGCTGCCCTGAGGTACATGGACGTCTGGGGCAAAGGGACCACGGTCACCGTCTCCTCAGC E150003903L1C001R0051737469/1/1 GGGGGAAGACCGATGGGCCCTTGGTGGAGGCTGAGGAGACGGTGACCGTGGTCCCTTTGCCCCAGACGTCCATGTACCTCAGGGCAGCCATGAGAAAGCCCCCGCGGAAGGAGGGCAGGAGGGCTCTGGGTGGGTCTGTGTTGAAACAGG E150003903L1C001R0072448096/1/1 TGGTCCCTTTGCCCCAGACGTCCATGTACCTCAGGGCAGCCATGAGAAAGCCCCCGCGGAAGGAGGGCAGGAGGGCTCTGGGTGGGTCTGTGTTGAAACAGGCCACGTAAAGCAACTCTCTAAAGGTCAAACCACCATAGATTTGAATCT E150003903L1C001R0093511907/2/2 TCCTGCCCTCCTTCCGCGGGGGCTTTCTCATGGCTGCCCTGAGGTACATGGACGTCTGGGGCAAAGGGACCACGGTCACCGTCTCCTCAGGTAAGAATGGCCACTCTAGGGCCTTTGTTTTCTGCTACTGCCTGTGGGGTTTCCTGAGCA E150003903L1C001R0100377918/1/1 TGGAGGAGGGTGCCAGGGGGAAGACCGATGGGCCCTTGGTGGAGGCTGAGGAGACGGTGACCGTGGTCCCTTTGCCCCAGACGTCCATGTACCTCAGGGCAGCCATGAGAAAGCCCCCGCGGAAGGAGGGCAGGAGGGCTCTGGGTGGGT E150003903L1C001R0100432518/1/1 GCCAGGGGGAAGACCGATGGGCCCTTGGTGGAGGCTGAGGAGACGGTGACCGTGGTCCCTTTGCCCCAGACGTCCATGTACCTCAGGGCAGCCATGAGAAAGCCCCCGCGGAAGGAGGGCAGGAGGGCTCTGGGTGGGTCTGTGTTGAAA E150003903L1C001R0104297198/2/2 AGAGCCCTCCTGCCCTCCTTCCGCGGGGGCTTTCTCATGGCTGCCCTGAGGTACATGGACGTCTGGGGCAAAGGGACCACGGTCACCGTCTCCTCAGCCTCCACCAAGGGCCCATCGGTCTTCCCCCTGGCACCCTCCTCCAAGAGCACC E150003903L1C001R0111569028/1/1 CTTGGAGGAGGGTGCCAGGGGGAAGACCGATGGGCCCTTGGTGGAGGCTGAGGAGACGGTGACCGTGGTCCCTTTGCCCCAGACGTCCATGTACCTCAGGGCAGCCATGAGAAAGCCCCCGCGGAAGGAGGGCAGGAGGGCTCTGAGATC E150003903L1C001R0111771499/1/1 GTTGCTTTACGTGGCCTGTTTCAACACAGACCCACCCAGAGCCCTCCTGCCCTCCTTCCGCGGGGGCTTTCTCATGGCTGCCCTGAGGTACATGGACGTCTGGGGCAAAGGGACCACGGTCACCGTCTCCTCAGCCTCCACCAAGGGCCC E150003903L1C001R0112736145/1/1 CCCAGAGCCCTCCTGCCCTCCTTCCGCGGGGGCTTTCTCATGGCTGCCCTGAGGTACATGGACGTCTGGGGCAAAGGGACCACGGTCACCGTCTCCTCAGGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACAAGTAGAGATCTCGT E150003903L1C001R0112736145/2/2 CCTGAGGAGACGGTGACCGTGGTCCCTTTGCCCCAGACGTCCATGTACCTCAGGGCAGCCATGAGAAAGCCCCCGCGGAAGGAGGGCAGGAGGGCTCTGGGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTCAGTCGATGTGTAGAT E150003903L1C001R0113400740/1/1 GACGGTGACCGTGGTCCCTTTGCCCCAGACGTCCATGTACCTCAGGGCAGCCATGAGAAAGCCCCCGCGGAAGGAGGGCAGGAGGGCTCTGGGTGGGTCTGTGTTGAAACAGGCCACGTAAAGCAACTCTCTAAAGGTCAAACCACCATA E150003903L1C001R0113875283/1/1 CTTTACGTGGCCTGTTTCAACACAGACCCACCCAGAGCCCTCCTGCCCTCCTTCCGCGGGGGCTTTCTCATGGCTGCCCTGAGGTACATGGACGTCTGGGGCAAAGGGACCACGGTCACCGTCTCCTCAGCCTCCACCAAGGGCCCATCG

It will be of great help if you can take a look. And I will also carefully review these sequences to find if they are screwy mapped.

Best, Junfeng

nadiadavidson commented 1 year ago

Hi Junfeng,

Thanks for sending these sequences, I was able to reproduce your result and this been very useful for working out why the fusion was missed. It looks there is around 4-9bp of sequence at the breakpoint that is not in the reference for either BCL2 or IGHJ. JAFFA assumes there is no non-templated or random sequence at the breakpoint as this tends to be fairly rare and allowing it would increase the false discoveries. Is there evidence of novel sequence at the breakpoint in the DNA sequencing as well?

Cheers, Nadia.

asaki1986 commented 1 year ago

Hi Nadia,

DNA sequencing of this sample showed that there is a break in BCL2 EXON3(UTR3) REGION, fused with partener break in IGHJ6 exon1 as follows,

18 60793522 BCL2:EXON_3(UTR3) + 14 106329453 IGHJ6:EXON_1 - 5to3 166 84 0.664 BND

TGCAATGCTCAGGAAACCCCACAGGCAGTAGCAGAAAACAAAGGCCCTAGAGTGGCCATTCTTACCTGAGGAGACGGTGACCGTGGTCCCTTTGCCCCAGACGTCCATGTACCTCAGGGCAGCCATGAGAAAGCCCCCGCGGAAGGAGGGCAGGAGGGCTCTGGGTGGGTCTGTGTTGAAACAGGCCACGTAAAGCAACTCTCTAAAGGTCAAACCACCATAGATTTGAATCTGCTGGTCATTTGCCATCTGGATTTTTAACT

I also blast the above DNA sequence against the genome, and found there is also a 9bp non-reference sequence.

It will be great if you can take a look.

Junfeng

nadiadavidson commented 1 year ago

Hi,

This looks consistent with what I see in the RNA. Seems like an interesting fusion!

Cheers, Nadia.

asaki1986 commented 1 year ago

Hi,

This looks consistent with what I see in the RNA. Seems like an interesting fusion!

Cheers, Nadia.

Hi Nadia,

As indicated in other tools, it seems that sometimes there is a random sequence inserted at the fusion junction, which occurs more often than one would expect in IGH fusions. This means that an IGH fusion looks many times like this IGH-random_sequence-BCL2. And sometimes the random sequence is longer than 25 bp.

The reason might be due to the high variability between people across the IG locus.

I also tried STAR-FUSION and FusionCatcher, only FusionCatcher reported this fusion, while the BCL2 gene has too much parteners in STAR-Fusion and was filtered in the final summary.

Best, Junfeng

nadiadavidson commented 1 year ago

Hi Junfeng,

Yes, all fusion finding tools make different assumption which can limit their detective power. There have been quite a few comparisons of different tools over the years, but if you are interested in a specific fusion (like the one above) it's best to try different tools out and select the best performing and perhaps integrate the pre-filtered files like what STAR-fusion and JAFFA generate. This is part of the reason that most clinical pipelines use more than one fusion finder. I'm also happy to recommend Arriba, which is quite good at detecting unusual fusion. We also have a computational method called MINTIE (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02507-8), which is a "catch-all" method and would have a good chance of detecting your fusion I think.

Best regards, Nadia.

asaki1986 commented 1 year ago

Thanks, Nadia

Will take your advice to combine multiple tools to get the results.

And hope JAFFA can take account of this specific fusion in later release.

Best, Junfeng