Teichlab / tracer

TraCeR - reconstruction of T cell receptor sequences from single-cell RNAseq data
Other
122 stars 48 forks source link

Include leader sequences in assembly? #88

Open jfass opened 5 years ago

jfass commented 5 years ago

In my application I have need downstream of the "full" TCR sequence, including leader all the way to the constant region. Is it conceivable to alter the references (pad out further into the leader) to capture those reads and assemble more of the TCR sequences?

mstubb commented 5 years ago

Hi,

I think you probably could use build with reference sequences where you have prepended the leader in front of the V although I'm not sure exactly how nicely IgBLAST will play with those when annotating the files.

Do you need to know the actual leader that was present in your cells or to just have a leader sequence for later expression etc? If the latter then the other option would be to just prepend an appropriate leader sequence to the assembled sequence after it's been reported by TraCeR.

When TraCeR reports the assembled sequences in filtered_TCR_seqs (eg https://github.com/Teichlab/tracer/blob/master/test_data/results/cell2/filtered_TCR_seqs/cell2_TCRseqs.fa), these have a constant region appended to them although it's based on the reference rather than being precisely what was assembled.

Again, if you don't really need to know the actual sequence then this might suffice for your needs.

The actual leaders and constant regions are typically assembled in the Trinity outputs but are then removed by IgBLAST during final annotation because it doesn't consider them. You may be able to write some additional code that will extract these sequences as appropriate from the original Trinity contigs.

Hope that all makes sense and is of some help.

All the best,

Mike