Teichlab / tracer

TraCeR - reconstruction of T cell receptor sequences from single-cell RNAseq data
Other
122 stars 48 forks source link

very low overall alignment rate with bowtie2 #114

Closed noorisotoudeh closed 3 years ago

noorisotoudeh commented 3 years ago

Thanks for your nice package for single cell TCR analysis. I have a question about alignment. Surprisingly, all of my fastq files show very low mapping rate. I checked fastqc file and it looks good but I don't know why it should show very low percent. I can run test data with 55.51% overall alignment rate TCR_A and 44.19% overall alignment rate TCR_B. i use the following command and this is the alignment output:

$ tracer assemble -p 8 -c tracer.conf -s Hsap fastq1.gz fastq2.gz name output_dir ##Finding recombinant-derived reads## Attempting new assembly for ['TCR_A', 'TCR_B']

TCR_A

801683 reads; of these: 801683 (100.00%) were paired; of these: 801376 (99.96%) aligned concordantly 0 times 307 (0.04%) aligned concordantly exactly 1 time 0 (0.00%) aligned concordantly >1 times

801376 pairs aligned concordantly 0 times; of these:
  6 (0.00%) aligned discordantly 1 time
----
801370 pairs aligned 0 times concordantly or discordantly; of these:
  1602740 mates make up the pairs; of these:
    1602555 (99.99%) aligned 0 times
    185 (0.01%) aligned exactly 1 time
    0 (0.00%) aligned >1 times

0.05% overall alignment rate

TCR_B

801683 reads; of these: 801683 (100.00%) were paired; of these: 801548 (99.98%) aligned concordantly 0 times 135 (0.02%) aligned concordantly exactly 1 time 0 (0.00%) aligned concordantly >1 times

801548 pairs aligned concordantly 0 times; of these:
  1 (0.00%) aligned discordantly 1 time
----
801547 pairs aligned 0 times concordantly or discordantly; of these:
  1603094 mates make up the pairs; of these:
    1602962 (99.99%) aligned 0 times
    130 (0.01%) aligned exactly 1 time
    2 (0.00%) aligned >1 times

0.03% overall alignment rate****

I have also edited the header of fastq files by removing sequence index and etc.. but it didn't change the result. here is the original header of my fastq file

@NB501311:706:HNFTMBGXH:1:11101:6390:1116 1:N:0:CGGAGCCT+ATAGAGAG AGCATATGCTTGTCTCAAAGATTAAGCCATGCATGTCT + AAAAAEEEEEEEEEEEEEEE/EEEEEEEEAEEAEAEEE @NB501311:706:HNFTMBGXH:1:11101:3328:1118 1:N:0:CGGAGCCT+ATAGAGAG GTGGAGATACCTCCTGTGTCTCCAGGATGGGTGGAGAT + AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE @NB501311:706:HNFTMBGXH:1:11101:14759:1145 1:N:0:CGGAGCCT+ATAGAGAG CCCTAGGCTTCCCCCGTGTGCCTGGACGAGTGCTGGTG

However i have name_TCRseqs.fa file produced by tracer but for many other files it is empty. do you think there is something wrong with my data or running?

Thanks,

Noori

mstubb commented 3 years ago

Hi Noori,

Please could you give me a bit more information about how you're generating your data? What's the experiment?

Thanks

Mike

On 15 Jun 2021, at 16:51, noorisotoudeh @.***> wrote:

Thanks for your nice package for single cell TCR analysis. I have a question about alignment. Surprisingly, all of my fastq files show very low mapping rate. I checked fastqc file and it looks good but I don't know why it should show very low percent. I can run test data with 55.51% overall alignment rate TCR_A and 44.19% overall alignment rate TCR_B. i use the following command and this is the alignment output:

$ tracer assemble -p 8 -c tracer.conf -s Hsap fastq1.gz fastq2.gz name output_dir **##Finding recombinant-derived reads## Attempting new assembly for ['TCR_A', 'TCR_B']

TCR_A

801683 reads; of these: 801683 (100.00%) were paired; of these: 801376 (99.96%) aligned concordantly 0 times 307 (0.04%) aligned concordantly exactly 1 time 0 (0.00%) aligned concordantly >1 times

801376 pairs aligned concordantly 0 times; of these: 6 (0.00%) aligned discordantly 1 time

801370 pairs aligned 0 times concordantly or discordantly; of these: 1602740 mates make up the pairs; of these: 1602555 (99.99%) aligned 0 times 185 (0.01%) aligned exactly 1 time 0 (0.00%) aligned >1 times 0.05% overall alignment rate

TCR_B

801683 reads; of these: 801683 (100.00%) were paired; of these: 801548 (99.98%) aligned concordantly 0 times 135 (0.02%) aligned concordantly exactly 1 time 0 (0.00%) aligned concordantly >1 times

801548 pairs aligned concordantly 0 times; of these: 1 (0.00%) aligned discordantly 1 time

801547 pairs aligned 0 times concordantly or discordantly; of these: 1603094 mates make up the pairs; of these: 1602962 (99.99%) aligned 0 times 130 (0.01%) aligned exactly 1 time 2 (0.00%) aligned >1 times 0.03% overall alignment rate**

I have also edited the header of fastq files by removing sequence index and etc.. but it didn't change the result. here is the original header of my fastq file

@NB501311:706:HNFTMBGXH:1:11101:6390:1116 1:N:0:CGGAGCCT+ATAGAGAG AGCATATGCTTGTCTCAAAGATTAAGCCATGCATGTCT + AAAAAEEEEEEEEEEEEEEE/EEEEEEEEAEEAEAEEE @NB501311:706:HNFTMBGXH:1:11101:3328:1118 1:N:0:CGGAGCCT+ATAGAGAG GTGGAGATACCTCCTGTGTCTCCAGGATGGGTGGAGAT + AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE @NB501311:706:HNFTMBGXH:1:11101:14759:1145 1:N:0:CGGAGCCT+ATAGAGAG CCCTAGGCTTCCCCCGTGTGCCTGGACGAGTGCTGGTG

However i have name_TCRseqs.fa file produced by tracer but for many other files it is empty. do you think there is something wrong with my data or running?

Thanks,

Noori

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Teichlab/tracer/issues/114, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIXBBQITAK3RUDMEZPXZ5LTS5ZH3ANCNFSM46XR2OGQ.

noorisotoudeh commented 3 years ago

Hi Mike, thanks for your quick response. it is full-length single cell RNA-seq libraries were prepared using the SMART-seq2 protocol. cDNA was fragmented using Illumina and amplified with indexed Nextera PCR primers.

mstubb commented 3 years ago

OK great.

And, just checking, are these human cells?

M

On 15 Jun 2021, at 17:09, noorisotoudeh @.***> wrote:

Hi Mike, thanks for your quick response. it is full-length single cell RNA-seq libraries were prepared using the SMART-seq2 protocol. cDNA was fragmented using Illumina and amplified with indexed Nextera PCR primers.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Teichlab/tracer/issues/114#issuecomment-861631330, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIXBBRRYVDQ6CSLYPZ36ELTS53LJANCNFSM46XR2OGQ.

noorisotoudeh commented 3 years ago

yes they are

mstubb commented 3 years ago

Great, thanks.

Looking at this in a bit more detail I think that your mapping rates look to be at roughly what one might expect in data from a genuine cell. The test data rates (~50%) are so high because those test input files are selected to be enriched for the TCR sequences.

For what proportion of your cells do you get reconstructed TCR sequences. If you run tracer summarise for all of your cells, what does TCR_summary.txt say?

noorisotoudeh commented 3 years ago

TCR_A reconstruction: 35 / 109 (32.1%) TCR_B reconstruction: 13 / 109 (11.9%)

AB productive reconstruction: 7 / 109 (6.4%)

+--------+----------------+---------------+----------------+ | | 0 recombinants | 1 recombinant | 2 recombinants | +--------+----------------+---------------+----------------+ | all A | 71 | 34 (89%) | 4 (11%) | | all B | 96 | 13 (100%) | 0 (0%) | | prod A | 74 | 33 (94%) | 2 (6%) | | prod B | 96 | 13 (100%) | 0 (0%) | +--------+----------------+---------------+----------------+

Clonotype groups

This is a text representation of the groups shown in clonotype_network_with_identifiers.svg. It does not exclude cells that only share beta and not alpha.

mstubb commented 3 years ago

Thanks!

So, it looks like reconstruction is working but at a fairly low rate.

Without knowing more about your experiment I'd guess that this is due to some property of the cells or the sequencing.

Sorry I can't be more help.

M

noorisotoudeh commented 3 years ago

thank you so much for your help. yes, i think the most of them are naive. so probably that's why they can not be mapped more. Thanks, Noori

mstubb commented 3 years ago

No problem. Good luck with the experiments!