alexdobin / STAR

RNA-seq aligner
MIT License
1.82k stars 501 forks source link

How does STAR handle V(D)J sequences (TCR/BCR)? #2178

Open ulyssebaruchel opened 1 month ago

ulyssebaruchel commented 1 month ago

Hi, I would like to know how STAR usually handles highly variably gene sequences like V(D)J sequences that code for parts of the TCR / BCR (T-cell and B-cell receptors), please?

I guess it depends on the reference. I am using this human genome reference:

This is important because it determines whether I should use post-STARsolo BAM files or pre-STARsolo FASTQ files as input to reconstruct TCR / BCR sequences.

Thank you very much!

michael-swift commented 1 month ago

my experience is that mapping in that region can be quite difficult and that is one of the reasons IgBlast exists.

One of the major issues is that the parameters you'd need to use for mapping to that region are different from the preferred ones for mapping to the rest of the genome which contains much less genetic variation.

In any case, I would advise consulting how it is is done in the many published workflows for assembling BCR and TCR from sequencing reads see BALDR, BASIC, BRACER etc.

ulyssebaruchel commented 1 month ago

Ok thank you, I wanted to see if it was relevant to look at the coverage from the STARsolo BAM output on IGV to better see from where the output quality (incomplete reconstruction of certain chains) of some of these TCR/BCR reconstruction methods on a sample stem.

michael-swift commented 1 month ago

that makes sense as a line of investigation -- if you're looking at why some assemblies fail I wonder if it could also be informative to look at how many reads map to the constant regions of the chains using the BAMs from STAR. I've found that is a more reliable quantification than the variable region

ulyssebaruchel commented 1 month ago

Dear @michael-swift, thank you for your advice. I am using a 5' biased approach which makes it so that the constant region is less covered. I will try to see if IgBlast gets me a really different BAM profile than STARsolo at the TCR / BCR loci. Thank you