test data not being reproduced

robertamezquita commented 6 years ago

I am running tracer test and not replicating the expected summary that is included with the release tracer-0.6.0. I am only detecting a single clonotype, not two clonotypes with 1 and 2 clones each.

My output from the default test data is below:

TCR_A reconstruction:   1 / 1 (100.0%)
TCR_B reconstruction:   1 / 1 (100.0%)

AB productive reconstruction:   1 / 1 (100.0%)

+--------+----------------+---------------+----------------+
|        | 0 recombinants | 1 recombinant | 2 recombinants |
+--------+----------------+---------------+----------------+
| all A  | 0              | 0 (0%)        | 1 (100%)       |
| all B  | 0              | 0 (0%)        | 1 (100%)       |
| prod A | 0              | 1 (100%)      | 0 (0%)         |
| prod B | 0              | 1 (100%)      | 0 (0%)         |
+--------+----------------+---------------+----------------+

I also looked in the test_data/results folder and noticed there were two subfolders, cell2 and cell3, and even tried appending the aligned reads fastq files to the cell1_*.fastq file that tracer test is ostensibly using to perform the test, and now get this result:

TCR_A reconstruction:   1 / 1 (100.0%)
TCR_B reconstruction:   1 / 1 (100.0%)

AB productive reconstruction:   1 / 1 (100.0%)

+--------+----------------+---------------+----------------+
|        | 0 recombinants | 1 recombinant | 2 recombinants |
+--------+----------------+---------------+----------------+
| all A  | 0              | 0 (0%)        | 1 (100%)       |
| all B  | 0              | 0 (0%)        | 1 (100%)       |
| prod A | 0              | 0 (0%)        | 1 (100%)       |
| prod B | 0              | 0 (0%)        | 1 (100%)       |
+--------+----------------+---------------+----------------+

Is tracer working and something is off with the test data, or is it something on my end with the tracer install?

For reference in the test_data folder this is the expected_summary/TCR_summary.txt output:

TCRA reconstruction:    3 / 3 (100.0%)
TCRB reconstruction:    3 / 3 (100.0%)
Paired productive chains    3 / 3 (100.0%)

+------------+----------------+---------------+----------------+
|            | 0 recombinants | 1 recombinant | 2 recombinants |
+------------+----------------+---------------+----------------+
| all_alpha  | 0              | 0 (0.0%)      | 3 (100.0%)     |
| all_beta   | 0              | 0 (0.0%)      | 3 (100.0%)     |
| prod_alpha | 0              | 3 (100.0%)    | 0 (0.0%)       |
| prod_beta  | 0              | 3 (100.0%)    | 0 (0.0%)       |
+------------+----------------+---------------+----------------+

#iNKT cells#
Found 0 iNKT cells

mstubb commented 6 years ago

Hi,

Thanks for this. Please could you let me know the exact command that you’re running and send a copy of the config file that you’re using.

Very best,

Mike

On 27 Apr 2018, at 19:26, Robert Anthony Amezquita notifications@github.com wrote:

I am running tracer test and not replicating the expected summary that is included with the release tracer-0.6.0. I am only detecting a single clonotype, not two clonotypes with 1 and 2 clones each.

My output from the default test data is below:

TCR_A reconstruction: 1 / 1 (100.0%) TCR_B reconstruction: 1 / 1 (100.0%)

AB productive reconstruction: 1 / 1 (100.0%)

+--------+----------------+---------------+----------------+ | | 0 recombinants | 1 recombinant | 2 recombinants | +--------+----------------+---------------+----------------+ | all A | 0 | 0 (0%) | 1 (100%) | | all B | 0 | 0 (0%) | 1 (100%) | | prod A | 0 | 1 (100%) | 0 (0%) | | prod B | 0 | 1 (100%) | 0 (0%) | +--------+----------------+---------------+----------------+ I also looked in the testdata/results folder and noticed there were two subfolders, cell2 and cell3, and even tried appending the aligned reads fastq files to the cell1*.fastq file that tracer test is ostensibly using to perform the test, and now get this result:

TCR_A reconstruction: 1 / 1 (100.0%) TCR_B reconstruction: 1 / 1 (100.0%)

AB productive reconstruction: 1 / 1 (100.0%)

+--------+----------------+---------------+----------------+ | | 0 recombinants | 1 recombinant | 2 recombinants | +--------+----------------+---------------+----------------+ | all A | 0 | 0 (0%) | 1 (100%) | | all B | 0 | 0 (0%) | 1 (100%) | | prod A | 0 | 0 (0%) | 1 (100%) | | prod B | 0 | 0 (0%) | 1 (100%) | +--------+----------------+---------------+----------------+ Is tracer working and something is off with the test data, or is it something on my end with the tracer install?

For reference in the test_data folder this is the expected_summary/TCR_summary.txt output:

TCRA reconstruction: 3 / 3 (100.0%) TCRB reconstruction: 3 / 3 (100.0%) Paired productive chains 3 / 3 (100.0%)

+------------+----------------+---------------+----------------+ | | 0 recombinants | 1 recombinant | 2 recombinants | +------------+----------------+---------------+----------------+ | all_alpha | 0 | 0 (0.0%) | 3 (100.0%) | | all_beta | 0 | 0 (0.0%) | 3 (100.0%) | | prod_alpha | 0 | 3 (100.0%) | 0 (0.0%) | | prod_beta | 0 | 3 (100.0%) | 0 (0.0%) | +------------+----------------+---------------+----------------+

iNKT cells

Found 0 iNKT cells — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

robertamezquita commented 6 years ago

*See comment below this one for how I fixed it with a weird workaround before digging into this jumble..

.tracerrc

[tracer_location]
    tracer_path = /fh/fast/gottardo_r/ramezqui_working/tools/tracer-0.6.0

[base_transcriptomes]
Mmus = /fh/fast/gottardo_r/ramezqui_working/reference/Ensembl/source/Mus_musculus.GRCm38.cdna.all.fa
Hsap = /fh/fast/gottardo_r/ramezqui_working/reference/Ensembl/source/Homo_sapiens.GRCh38.cdna.all.fa

[kallisto_base_indices]
Mmus = /fh/fast/gottardo_r/ramezqui_working/reference/Ensembl/source/Mus_musculus.GRCm38.kallisto.index
Hsap = /fh/fast/gottardo_r/ramezqui_working/reference/Ensembl/source/Homo_sapiens.GRCh38.kallisto.index

[trinity_options]
max_jellyfish_memory = 1G
trinity_version = 2

[IgBlast_options]
igblast_seqtype = TCR

I am running in a conda env using the following packges:

# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
biopython=1.71=py36_0
blas=1.1=openblas
blast=2.7.1=boost1.64_3
boost=1.64.0=py36_4
boost-cpp=1.64.0=1
bowtie=1.2.2=py36pl5.22.0_0
bowtie2=2.3.4.1=py36pl5.22.0_0
bzip2=1.0.6=1
ca-certificates=2018.4.16=0
certifi=2018.4.16=py36_0
collectl=4.0.4=pl5.22.0_3
curl=7.59.0=1
cycler=0.10.0=py36_0
cython=0.28.2=py36_0
dbus=1.10.22=0
decorator=4.3.0=py_0
expat=2.2.5=0
fastool=0.1.4=2
fontconfig=2.12.6=0
freetype=2.8.1=0
future=0.16.0=py36_0
gettext=0.19.8.1=0
glib=2.55.0=0
gmp=6.1.2=0
gnutls=3.5.17=0
gst-plugins-base=1.8.0=0
gstreamer=1.8.0=1
hdf5=1.8.17=11
icu=58.2=0
igblast=1.7.0=pl5.22.0_0
intel-openmp=2018.0.0=8
jellyfish=2.2.6=0
jpeg=9b=2
kallisto=0.44.0=hdf51.8.17_1
kiwisolver=1.0.1=py36_1
krb5=1.14.6=0
libdeflate=0.8=0
libffi=3.2.1=3
libgcc=7.2.0=h69d50b8_2
libgcc-ng=7.2.0=hdf63c60_3
libgfortran=3.0.0=1
libgfortran-ng=7.2.0=hdf63c60_3
libiconv=1.15=0
libidn11=1.33=0
libpng=1.6.34=0
libssh2=1.8.0=2
libstdcxx-ng=7.2.0=hdf63c60_3
libtiff=4.0.9=0
libxcb=1.13=0
libxml2=2.9.8=0
matplotlib=2.2.2=py36_1
mkl=2018.0.2=1
mkl_fft=1.0.2=py36_0
mkl_random=1.0.1=py36_0
mmtf-python=1.1.0=py_0
mock=2.0.0=py36_0
msgpack-python=0.5.6=py36_0
ncurses=5.9=10
nettle=3.3=0
networkx=1.11=py36_0
nose=1.3.7=py36_2
numpy=1.14.2=py36_blas_openblas_200
olefile=0.45.1=py36_0
openblas=0.2.20=7
openjdk=8.0.121=1
openssl=1.0.2o=0
pandas=0.22.0=py36_0
parafly=r2013_01_21=1
patsy=0.5.0=py36_0
pbr=4.0.2=py_0
pcre=8.41=1
perl=5.22.0.1=0
perl-app-cpanminus=1.7043=pl5.22.0_0
perl-archive-tar=2.18=pl5.22.0_2
perl-carp=1.38=pl5.22.0_0
perl-compress-raw-bzip2=2.069=1
perl-compress-raw-zlib=2.069=3
perl-data-dumper=2.161=pl5.22.0_0
perl-exporter=5.72=pl5.22.0_0
perl-exporter-tiny=0.042=1
perl-extutils-makemaker=7.24=pl5.22.0_1
perl-io-compress=2.069=pl5.22.0_2
perl-io-zlib=1.10=1
perl-list-moreutils=0.428=pl5.22.0_0
perl-module-build=0.4224=pl5.22.0_0
perl-pathtools=3.73=0
perl-scalar-list-utils=1.45=2
perl-test-more=1.001002=pl5.22.0_0
perl-threaded=5.22.0=pl5.22.0_12
pillow=5.1.0=py36_0
pip=9.0.3=py36_0
prettytable=0.7.2=py36_1
pydotplus=2.0.2=py36_0
pyparsing=2.2.0=py36_0
pyqt=5.6.0=py36_5
python=3.6.5=1
python-dateutil=2.7.2=py_0
python-levenshtein=0.12.0=py36_1
pytz=2018.4=py_0
qt=5.6.2=7
readline=7.0=0
reportlab=3.4.0=py36_0
samtools=1.8=3
scipy=1.0.1=py36_blas_openblas_200
seaborn=0.8.1=py36_0
setuptools=39.0.1=py36_0
sip=4.18=py36_1
six=1.11.0=py36_1
slclust=02022010=2
sqlite=3.20.1=2
statsmodels=0.8.0=py36_0
tbb=2018_20171205=0
tk=8.6.7=0
tornado=5.0.2=py36_0
trimmomatic=0.36=5
trinity=2.5.1=1
wheel=0.31.0=py36_0
xorg-libxau=1.0.8=3
xorg-libxdmcp=1.1.2=3
xz=5.2.3=0
zlib=1.2.11=0

Throughout the run is fairly normal, minus the following errors (I won't paste the whole output, just the errors):

## after "Harvesting all assembled transcripts into a single multi-fasta file..."

Saturday, April 28, 2018: 21:32:11  CMD: /fh/fast/gottardo_r/ramezqui_working/tools/miniconda3/envs/tracer/opt/trinity-2.5.1/util/support_scripts/get_Trinity_gene_to_trans_map.pl /home/ramezqui/test-out/results/cell1/Trinity_output/Trinity_cell1_TCR_A.Trinity.fasta > /home/ramezqui/test-out/results/cell1/Trinity_output/Trinity_cell1_TCR_A.Trinity.fasta.gene_trans_map
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
##TCR_B##

And then the run ends with the following lines (this is the complete output, ends on a ")

##Filtering by read count##
/fh/fast/gottardo_r/ramezqui_working/tools/miniconda3/envs/tracer/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6462: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg.
  warnings.warn("The 'normed' kwarg is deprecated, and has been "

The command I used was tracer test -p 8 -o test-out

robertamezquita commented 6 years ago

Now, here's what's really strange - it came out with the expected output when I did the following: created the test-out/results directory, then copied tracer/test_data/results/cell* folders (cell1 and cell2) to test-out/results, then ran tracer test -p 8 -o test-out.

Here's the output I ended up with following that "workaround":

test-out/results/filtered*/TCR_summary.txt

TCR_A reconstruction:   3 / 3 (100.0%)
TCR_B reconstruction:   3 / 3 (100.0%)

AB productive reconstruction:   3 / 3 (100.0%)

+--------+----------------+---------------+----------------+
|        | 0 recombinants | 1 recombinant | 2 recombinants |
+--------+----------------+---------------+----------------+
| all A  | 0              | 0 (0%)        | 3 (100%)       |
| all B  | 0              | 0 (0%)        | 3 (100%)       |
| prod A | 0              | 3 (100%)      | 0 (0%)         |
| prod B | 0              | 3 (100%)      | 0 (0%)         |
+--------+----------------+---------------+----------------+

#Clonotype groups#
This is a text representation of the groups shown in clonotype_network_with_identifiers.pdf.
It does not exclude cells that only share beta and not alpha.

cell1, cell2

test-out/results/filtered*/recombinants.txt

cell_name   locus   recombinant_id  productive  reconstructed_length
cell1   A   TRAV9D-4_GTGAGGGGGAAGGAGAGGCA_TRAJ37    False   347
cell1   A   TRAV4-2_TTGAGAATAA_TRAJ43   True    325
cell1   B   TRBV4_AGCTACAACTCCT_TRBJ2-7 True    334
cell1   B   TRBV12-1_GCTCTACAACAGGGGGGGCACCG_TRBJ2-2    False   100

cell2   A   TRAV9D-4_GTGAGGGGGAAGGAGAGGCA_TRAJ37    False   347
cell2   A   TRAV4-2_TTGAGAATAA_TRAJ43   True    325
cell2   B   TRBV12-1_GCTCTACAACAGGGGGGG(C)ACCGG_TRBJ2-2 False   100
cell2   B   TRBV4_AGCTACAACTCCT_TRBJ2-7 True    334

cell3   A   TRAV7-5_TGAGCGACACC_TRAJ27  True    334
cell3   A   TRAV3-3_CAGTGGGGGAACTA_TRAJ26   False   339
cell3   B   TRBV31_AGTCTTGACACAAGA_TRBJ2-5  False   335
cell3   B   TRBV31_TGGAGCCCCGGGACAGGGCTCAACC_TRBJ1-5    True    343

Which matches the expected_results output except for maybe the order.

robertamezquita commented 6 years ago

Also, h/t to Matt Fitzgibbon from Fred Hutch for doing all the heavy lifting here in terms of figuring out the workaround/providing the conda env to run tracer with (I'll update with his GH tag to properly thank him once I find it!)

mstubb commented 6 years ago

Hi,

Thanks for these details.

tracer test doesn’t usually need the -o switch because it just expects to write to tracer/test_data/results/ where the other two cells already are.

However, your workaround nicely replicates that and is fine too.

I'll close this now but feel free to reopen or open a new one if you need anything else.

Cheers,

Mike

robertamezquita commented 6 years ago

Just to be clear, shouldn't the results produce the expected results regardless of whether the -o switch is invoked or not? I think that ideally, that would need to be fixed before closing the issue.

mstubb commented 6 years ago

You’re very right. As written, the README suggests that you should be able to specify a custom output location for test.

That got added a while ago and I didn’t think about the implications. test wasn’t originally written with the intention of having any output location other than /test_data/results and I didn’t think through fully when merging that README change.

For now, I’ll revert the README but I think you’re right that it would be better for test to cope with any output location. I’ll add that to the to-do list.

Thanks for catching this.

Best,

Mike

On 30 Apr 2018, at 18:04, Robert Anthony Amezquita notifications@github.com wrote:

Just to be clear, shouldn't the results produce the expected results regardless of whether the -o switch is invoked or not? I think that ideally, that would need to be fixed before closing the issue.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/Teichlab/tracer/issues/68#issuecomment-385462555, or mute the thread https://github.com/notifications/unsubscribe-auth/ABFwhvCBgJ9tVGGyLstt2vo5jIBWXj6dks5tt0QVgaJpZM4Tq0Qv.

Teichlab / tracer

test data not being reproduced #68

iNKT cells