Open mikessh opened 6 years ago
These sequences consistently lack the starting C and final F. Can I simply add these or should I really check all via the provided accession number for the nucleotide sequence? Also, the sequence data from all donors are pooled and the frequencies indicated are over these pooled data. This means a lot of reads with frequency lower than 0.1%. Given that they claim to use high-quality read data (and used UMI ), should I also include these low-frequency reads, and if not, what is an appropriate cut-off in this case?
Looks like I've missed that. C/F issue is not a problem, as this is human we just add them (mice have C/W in some J if I remember it correctly). Yes, lets put everything in first submission. We should carefully mark paired/single-cell records.
Ok, I am almost done with this. I had a hard time figuring out which methods were applied where, so here's some debugging questions:
direct
. I will add that direct
also means direct TCR affinity measurement to the specification (README.md) I've just discovered extremely strange artefacts - cysteines inside CDR3, strange J alignments. Downloaded raw data and re-analyzing it. Should replace the chunk soon (except for paired sequences)..
My apologies, I did not notice this before
No problems, I've also missed this, quite hard to check all 10k sequences :)
After checking, it looks like those Cys codons are mostly in N-region of TRA CDR3s, so perhaps they are real. They have quite low frequency (<0.1%), but supported by a large number of reads (say 100-300 raw reads). Also I've checked two donors that gave an extremely diverse repertoire (one had 3k unique clonotypes) - none of the top clonotypes appear to be some sort of "public" TCRs, so they are likely real antigen specific.
So perhaps we should not do anything here, or just remove all records with low (<=10^-4) frequency
PS. Author CDR3s (mapped using MIGEC), and CDR3s mapped from raw data using MIXCR by me appear to be more or less the same
Chen G, Yang X, Ko A, Sun X, Gao M, Zhang Y, Shi A, Mariuzza RA, Weng NP. Sequence and Structural Analyses Reveal Distinct and Highly Diverse Human CD8+ TCR Repertoires to Immunodominant Viral Antigens. Cell Rep. 2017 Apr 18;19(3):569-583. doi: 10.1016/j.celrep.2017.03.072.