Teichlab / tracer

TraCeR - reconstruction of T cell receptor sequences from single-cell RNAseq data
Other
122 stars 48 forks source link

ambiguity in clonotype assignment #82

Closed lincoln-harris closed 5 years ago

lincoln-harris commented 5 years ago

Hi @sdarmanis and I have been digging into the sequence level alignment of cells that are assigned to the same clonogroup and have been noticing some strange things. It seems that the CDR3 sequence doesnt have to be a perfect match for two cells to be assigned to the same clonogroup. For example, in this figure all of these cells have been assigned to the same clonogroup, yet the bottom 3 have very different CDR3 sequence than the rest (apologies for the poor resolution)

screen shot 2018-11-30 at 1 33 42 pm

Why is Tracer assigning these to the same clonogroup? Thanks Lincoln

mstubb commented 5 years ago

Hi Lincoln,

Please could you send me the tracer output directories for these cells?

Thanks,

Mike

On 30 Nov 2018, at 21:39, Lincoln Harris notifications@github.com wrote:

Hi @sdarmanis https://github.com/sdarmanis and I have been digging into the sequence level alignment of cells that are assigned to the same clonogroup and have been noticing some strange things. It seems that the CDR3 sequence doesnt have to be a perfect match for two cells to be assigned to the same clonogroup. For example, in this figure all of these cells have been assigned to the same clonogroup, yet the bottom 3 have very different CDR3 sequence than the rest (apologies for the poor resolution) https://user-images.githubusercontent.com/33501625/49316061-94057680-f4a4-11e8-800b-a0e41eb09ee3.png Why is Tracer assigning these to the same clonogroup? Thanks Lincoln

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Teichlab/tracer/issues/82, or mute the thread https://github.com/notifications/unsubscribe-auth/ABFwhm5MuanAHFtS6jkueMQcewG4yTkaks5u0aWPgaJpZM4Y8bzd.

lincoln-harris commented 5 years ago

yep, output folder is here https://github.com/czbiohub/sclung_adeno/tree/master/TCR_analysis/filtered_TCRAB_summary its a private repo but i just gave you read access, i think

mstubb commented 5 years ago

Yep, got it. Thanks.

What are the cell names of those cells in the alignment you sent originally. If you send me those, I can dig into the summary output and see if anything obvious stands out. If not, I will probably need the per-cell TraCeR output directories so that I can look at the intermediate outputs from the alignments, assembly and parsing stages. I'll let you know if that's the case.

Cheers,

Mike

On 30 Nov 2018, at 22:10, Lincoln Harris notifications@github.com wrote:

yep, output folder is here https://github.com/czbiohub/sclung_adeno/tree/master/TCR_analysis/filtered_TCRAB_summary https://github.com/czbiohub/sclung_adeno/tree/master/TCR_analysis/filtered_TCRAB_summary its a private repo but i just gave you access, i think

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Teichlab/tracer/issues/82#issuecomment-443354890, or mute the thread https://github.com/notifications/unsubscribe-auth/ABFwhsFmUyrPIvkkJrPKcwLkZgUq1Kcnks5u0ay8gaJpZM4Y8bzd.

lincoln-harris commented 5 years ago

yep, its clonogroup 1, so the cell names are:

screen shot 2018-11-30 at 2 57 44 pm

The odd ones out are K20_B003659, E22_B003659 and G21_B000883. Im noticing that even the flanking V and J regions are different within these 'odd' cells. Thanks a lot!

mstubb commented 5 years ago

Hi,

I've had a look at this and it helps to look at the clonotype_network_with_identifiers.pdf network graph (https://github.com/czbiohub/sclung_adeno/blob/master/TCR_analysis/filtered_TCRAB_summary/clonotype_network_with_identifiers.pdf https://github.com/czbiohub/sclung_adeno/blob/master/TCR_analysis/filtered_TCRAB_summary/clonotype_network_with_identifiers.pdf) to work out what's going on. Here you can see how the cells connect to each other within the clonogroup 1 subgraph.

Of the 'odd' cells E22_B003659 and K20_B003659 look to be genuinely clonally related to each other and share both an alpha and a beta sequence. However, E22_B003659 doesn't share sequences with any other cell in that subgraph. It gets connected to the rest because K20_B003659 shares another beta sequence with K22_B003659 which itself shares a different betas sequence with a lot of the rest of the graph (it's really easier to look at this in the PDF rather than try to explain it!). It is, of course, hard to say whether this is genuine sharing or low-level experimental contamination (or some other artefact) so interpret it with caution.

G21_B00083 has the TRBV27 beta sequence that's common throughout this subgraph but also has another TRBV sequence that isn't seen in any of the other cells. It also has two alpha sequences that aren't seen anywhere else - this could be biologically explained because beta recombines first during T cell development followed by rounds of proliferation before alpha recombines in the progeny. This means that it is not unexpected to see cells with the same beta but different alphas.

More generally, this is an example of TraCeR's permissive rules around grouping cells into clonotypes where any shared sequences will suck cells into a subgraph. When we wrote this, experiments were of the scale where it was tractable to inspect the network graphs to check for things like this although it's now apparent that this is not so easy with larger experiments.

I'd be happy to accept any pull requests that make changes to add options improving these representations. If you're interested in doing that, have a look at the Summariser class (https://github.com/Teichlab/tracer/blob/master/tracerlib/tasks.py#L643 https://github.com/Teichlab/tracer/blob/master/tracerlib/tasks.py#L643) which calls tracer_func.draw_network_from_cells (https://github.com/Teichlab/tracer/blob/master/tracerlib/tracer_func.py#L813 https://github.com/Teichlab/tracer/blob/master/tracerlib/tracer_func.py#L813) . This constructs the graphs using NetworkX (v1, https://networkx.github.io/documentation/networkx-1.1/ https://networkx.github.io/documentation/networkx-1.1/) with each node being an object of class Cell (https://github.com/Teichlab/tracer/blob/master/tracerlib/core.py#L10 https://github.com/Teichlab/tracer/blob/master/tracerlib/core.py#L10). To make clonotype definitions more stringent you'd probably want to change the rules about how edges are added to the graphs.

Hope that's helpful. Let me know if you want to discuss anything else.

All the best,

Mike

On 30 Nov 2018, at 23:00, Lincoln Harris notifications@github.com wrote:

yep, its clonogroup 1, so the cell names are:

https://user-images.githubusercontent.com/33501625/49319152-527ac880-f4b0-11e8-9e05-18fd5360ab10.png The odd ones out are K20_B003659, E22_B003659 and G21_B000883. Im noticing that even the flanking V and J regions are different within these 'odd' cells. Thanks a lot!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Teichlab/tracer/issues/82#issuecomment-443365625, or mute the thread https://github.com/notifications/unsubscribe-auth/ABFwhqcqVN6CnhDKZGbF-Owaok0zGXRaks5u0biLgaJpZM4Y8bzd.

lincoln-harris commented 5 years ago

Thanks a lot. So maybe a worthwhile modification is to define a clonogroup as only those cells that share an A and B? Or otherwise accept that when dealing with clonogroups of this size, youre going to see some messiness.

mstubb commented 5 years ago

No problem.

I think that the best way to do it would be to have an option that sets the level of stringency required for assigning cells to clonotypes as they are reported in the summary.

Ideally this would be coupled to better reporting and visualisation to make it easier to assess the structure of the clonotypes and how tenuously cells are connected to them.