cfe-lab / MiCall

Pipeline for processing FASTQ data from an Illumina MiSeq to genotype human RNA viruses like HIV and hepatitis C
https://cfe-lab.github.io/MiCall
GNU Affero General Public License v3.0
14 stars 9 forks source link

Report common T-cell receptor sequences #461

Open donkirkby opened 5 years ago

donkirkby commented 5 years ago

After merging the reads and checking for the V3LOOP amplicon, check for T-cell receptor alpha and beta (TCA and TCB) matches. Take all of those matches, and report the most common nucleotide sequences, by exact match. For the first version, report any sequences with prevalence of at least 5% of all T-cell receptor matches.

jeff-k commented 5 years ago

This looks like it would be the way to generate v-quest style results offline: https://github.com/williamdlees/TRIgS/blob/master/docs/IgBLASTPlus.md

jeff-k commented 5 years ago

I followed a similar strategy to that in the script above, parsing the text output of NCBI's igblastn tool (https://ncbi.github.io/igblast/) which has been set up to match IMGT's v-quest tool with these references: http://www.imgt.org/IMGT_vquest/share/textes/datareleases.html

Note that igblast may be a suitable tool for an HLA pipeline.