alexdobin / STAR

RNA-seq aligner
MIT License
1.82k stars 503 forks source link

chimJunctionOverhangMin vs chimSegmentMin #744

Open helloeoh001 opened 4 years ago

helloeoh001 commented 4 years ago

Hello!

It is a question rather than an issue. STAR is a very good aligner with its great performance and the configurable options for users to control mapping in detail. I use STAR and STAR-Fusion to detect gene fusions from clinical cancer samples. And, the parameters are being adjusted for the best results of finding fusion genes. The two options seem to be overlapped, one is 'chimJunctionOverhangMin', and the other is 'chimSegmentMin'. The overhanged-segment over chimeric junction in a fusion read seem to be the same with the chimeric segment of the fusion read.
How 'chimJunctionOverhangMin' could affect on selecting fusion candidate reads into Chmeric.out.junction file being independent with 'chimSegmentMin' ? Thank you in advance.

alexdobin commented 4 years ago

Hi @helloeoh001

If a chimeric segment is normally spliced, it will contain >1 alignment blocks. --chimSegmentMin controls the minimum total length of the chimeric segment, which is calculated as a sum of alignment blocks of the segment. --chimJunctionOverhangMin, on the other hand, controls the minimum length of the block adjacent to the chimeric junction. This prevents chimeric junction from happening very close to the normal junction.

Cheers Alex

helloeoh001 commented 4 years ago

Hello Alex,

Thank you for the explanation! I understood the difference between the two options. But it is not clear for me how the chimJunctionOverhangMin prevents chimeric junction from happening very close to the normal junction. In the figure below, overhang (ATCCG) could stay with the main segment, or could be splitted and mapped in gene B depending on the option value. My understanding is that if short overhangs are allowed to be splitted and mapped in another gene, then the chimeric junctions are likely to be normal junctions which could be false. Am I right? Or could you explain how the option prevents chimeric junction in normal junction? Thank you!

image

Ensel

alexdobin commented 4 years ago

Hi @helloeoh001

--chimJunctionOverhangMin filters against two junctions that are too close together, one normal junction and one chimeric junction, i.e. it requires that exon2 in the configuration below is not too short:

exon1-------exon2-------------------exonOfAnotherGene
      normal           chimeric

Cheers Alex

alextuck commented 3 years ago

Hi @alexdobin , Thanks for the great program :) From this post, it is still unclear to me exactly what situation --chimJunctionOverhangMin is trying to avoid? In the example immediately above, do you mean that an exon1=>exonOfAnotherGene chimera could be misannotated as an [exon1-exon2]=>exonOfAnotherGene chimera (this does not make sense to me, but is how I interpret your comment). Could you perhaps show an example of this with a sequence? I think then it would be clearer? Thanks! Alex

alexdobin commented 3 years ago

Hi Alex,

--chimJunctionOverhangMin N will filter out cases wehre |exon2|<N. This is to avoid very short exons right next to the chimeric junction.

Cheers Alex

alextuck commented 3 years ago

Hi Alex,

Thank you very much for your reply.

However, it is not clear to me why a short exon 2 would be a problem? Is this usually an artefact? (I presume there are not many exons that are so short, so do not understand why a filter is needed to remove these events?).

Best wishes, and thanks again,

Alex

On Fri, 2 Oct 2020, 17:53 Alexander Dobin, notifications@github.com wrote:

Hi Alex,

--chimJunctionOverhangMin N will filter out cases wehre |exon2|<N. This is to avoid very short exons right next to the chimeric junction.

Cheers Alex

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alexdobin/STAR/issues/744#issuecomment-702812765, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRIX44BCQ2FAARFWTEDE6DSIXZNXANCNFSM4IZGC2QQ .

alexdobin commented 3 years ago

Hi Alex,

sorry for the belayed reply. Short exons (micro-exons) are very rare in the human annotations, so the possibility that you will have a true chimeric junction next to a micro-exon is low. On the other hand, small mapping blocks often arise as mapping artifacts, because a short sequence can map to many positions in the genome. The --chimJunctionOverhangMin filters out such false chimeras.

Cheers Alex

alextuck commented 3 years ago

Hi Alex,

Thank you very much for clarifying that - this is along the lines of what I was assuming must be the case. I presume this is only an issue when you do not require the segments of a chimeric read to respect the annotated exon boundaries? (otherwise the mapping artefacts you describe should not occur)...

Best wishes,

Alex

On Wed, 21 Oct 2020 at 16:44, Alexander Dobin notifications@github.com wrote:

Hi Alex,

sorry for the belayed reply. Short exons (micro-exons) are very rare in the human annotations, so the possibility that you will have a true chimeric junction next to a micro-exon is low. On the other hand, small mapping blocks often arise as mapping artifacts, because a short sequence can map to many positions in the genome. The --chimJunctionOverhangMin filters out such false chimeras.

Cheers Alex

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alexdobin/STAR/issues/744#issuecomment-713628802, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRIX45FHGJS5H5RUGDMPMDSL3XTPANCNFSM4IZGC2QQ .

alexdobin commented 3 years ago

Hi Alex,

right, for annotated microexons, this requirement would not be needed. However, presently, the chimeric junctions are not required to match the annotated exon boundaries. This matching has to be done after mapping - e.g. STAR-Fusion does it.

If you are worried that you are missing the chimeric junctions with microexons, you can set this parameter to 0, and then filter out the unannotated microexons to keep only annotated ones.

Cheers Alex

helloeoh001 commented 3 years ago

Hi Alex,

right, for annotated microexons, this requirement would not be needed. However, presently, the chimeric junctions are not required to match the annotated exon boundaries. This matching has to be done after mapping - e.g. STAR-Fusion does it.

If you are worried that you are missing the chimeric junctions with microexons, you can set this parameter to 0, and then filter out the unannotated microexons to keep only annotated ones.

Cheers Alex

What will happen if 'read_A' has the overhang with length of 15bp, when --chimJunctionOverhangMin=20. The read would not be included in the 'Chimeric.out.junction' ? Or, the short overhang is ignored when determining the junction position, then the read_A is included in the 'Chimeric.out.junction'? image

Best wishes and thank you,

Ensel

alexdobin commented 3 years ago

Hi @helloeoh001

with --chimJunctionOverhangMin 20, reads with overhangs <20b on either side of the junction will not be included in the Chimeric outputs - so in your example the read_A will not be included.

Cheers Alex