Teichlab / tracer

TraCeR - reconstruction of T cell receptor sequences from single-cell RNAseq data
Other
122 stars 48 forks source link

Running TraCeR on single-cell PacBio data #117

Closed ndrubins closed 2 years ago

ndrubins commented 2 years ago

Hi,

My data are ~full-length mouse TCRs sequenced on PacBio. I imagine that the assembly steps, leave the IgBlast part, of TraCeR are probably not necessary because my reads are in essence the contigs that the assembly step produces, but I am interested in running the subsequent steps of TraCeR - quantification and determination of the clonotype of each cell.

Do you think it is reasonable to run TraCeR on my data?

I did try it on one cell (for which 114 reads were sequenced), with this command:

/home/rn/git_repos/tracer/tracer assemble -c /home/rn/git_repos/tracer/tracer.conf --resource_dir /home/rn/git_repos/tracer/resources -s mouse --loci A B G D -m assembly --single_end --fragment_length 1071 --fragment_sd 189 /data/mouse/TCR/sample_1/cell_1.fastq cell_1 /data/mouse/TCR/sample_1

At the Trinity Phase 1: Clustering of RNA-Seq Reads step I'm getting these error messages:

Thursday, January 6, 2022: 21:59:15     CMD: cat /data/mouse/TCR/sample_1/cell_1/aligned_reads/cell_1_TCR_G.fastq | seqtk-trinity seq -A -R 1 - >> single.fa
Error, no records were correctly parsed from -Error, cmd: cat /data/mouse/TCR/sample_1/cell_1/aligned_reads/cell_1_TCR_G.fastq | seqtk-trinity seq -A -R 1 - >> single.fa died with ret 1280 at /home/rn/software/trinityrnaseq-v2.11.0/Trinity line 2826.
        main::process_cmd("cat /data/mouse/TCR/sample_"...) called at /home/rn/software/trinityrnaseq-v2.11.0/Trinity line 2713
    eval {...} called at /home/rn/software/trinityrnaseq-v2.11.0/Trinity line 2670
        main::prep_seqs(ARRAY(0x5597660f8008), "fq", "single", undef) called at /home/rn/software/trinityrnaseq-v2.11.0/Trinity line 1602
        main::run_Trinity() called at /home/rn/software/trinityrnaseq-v2.11.0/Trinity line 1404
        eval {...} called at /home/rn/software/trinityrnaseq-v2.11.0/Trinity line 1403

Trinity run failed. Must investigate error above.

Thursday, January 6, 2022: 21:59:15     CMD: cat /data/mouse/TCR/sample_1/cell_1/aligned_reads/cell_1_TCR_D.fastq | seqtk-trinity seq -A -R 1 - >> single.fa
Error, no records were correctly parsed from -Error, cmd: cat /data/mouse/TCR/sample_1/cell_1/aligned_reads/cell_1_TCR_D.fastq | seqtk-trinity seq -A -R 1 - >> single.fa died with ret 1280 at /home/rn/software/trinityrnaseq-v2.11.0/Trinity line 2826.
        main::process_cmd("cat /data/mouse/TCR/sample_"...) called at /home/rn/software/trinityrnaseq-v2.11.0/Trinity line 2713
        eval {...} called at /home/rn/software/trinityrnaseq-v2.11.0/Trinity line 2670
        main::prep_seqs(ARRAY(0x55608157ef48), "fq", "single", undef) called at /home/rn/software/trinityrnaseq-v2.11.0/Trinity line 1602
        main::run_Trinity() called at /home/rn/software/trinityrnaseq-v2.11.0/Trinity line 1404
        eval {...} called at /home/rn/software/trinityrnaseq-v2.11.0/Trinity line 1403

Trinity run failed. Must investigate error above.

Then I'm getting these warnings at the Assembling Trinity Contigs:

##TCR_A##
##TCR_B##
##TCR_G##
*** WARNING *** Trinity command ['/home/rn/software/trinityrnaseq-v2.11.0/Trinity', '--seqType', 'fq', '--max_memory', '1G', '--CPU', '1', '--full_cleanup', '--no_normalize_reads', '--single', '/data/mouse/TCR/sample_1/cell_1/aligned_reads/cell_1_TCR_G.fastq', '--output', '/data/mouse/TCR/sample_1/cell_1/Trinity_output/Trinity_cell_1_TCR_G'] failed for locus TCR_G
##TCR_D##
*** WARNING *** Trinity command ['/home/rn/software/trinityrnaseq-v2.11.0/Trinity', '--seqType', 'fq', '--max_memory', '1G', '--CPU', '1', '--full_cleanup', '--no_normalize_reads', '--single', '/data/mouse/TCR/sample_1/cell_1/aligned_reads/cell_1_TCR_D.fastq', '--output', '/data/mouse/TCR/sample_1/cell_1/Trinity_output/Trinity_cell_1_TCR_D'] failed for locus TCR_D

I'm assuming that this cell only expresses the TCR_A and TCR_B chains and therefore I'm getting these warnings of not being able to find TCR_G and TCR_D.

The filtered_TCR_seqs, unfiltered_TCR_seqs, and expression_quantification folders in the output directory are empty (the aligned_reads and IgBLAST_output are not empty), and subsequently running tracer summarise with this command:

/home/rn/git_repos/tracer/tracer summarise -c /home/rn/git_repos/tracer/tracer.conf --resource_dir /home/rn/git_repos/tracer/resources -s mouse --loci A B G D --no_networks /data/mouse/TCR/sample_1/cell_1 fails with the message:

Traceback (most recent call last):
  File "/home/rn/git_repos/tracer/tracer", line 21, in <module>
    launch()
  File "/home/rn/git_repos/tracer/tracerlib/launcher.py", line 43, in launch
    Task().run()
  File "/home/rn/git_repos/tracer/tracerlib/tasks.py", line 800, in run
    pc = round((count / float(total_cells)) * 100, 1)
ZeroDivisionError: float division by zero

Any help would be highly appreciated