derijkp / scywalker

A program for the analysis of single cell nanopore long read data
13 stars 0 forks source link

Test: Analysis ended with XX errors #4

Closed nick-youngblut closed 5 months ago

nick-youngblut commented 7 months ago

I've run the test dataset twice:

Reprex:

wget https://github.com/derijkp/scywalker/releases/download/v0.108.0/scywalker_test.tar.gz \
  && tar xvzf scywalker_test.tar.gz \
  && cd scywalker_test

~/bin/scywalker-0.108.0-Linux-x86_64/scywalker_makerefdir -organelles '' g17 genome.fa genes.gtf

~/bin/scywalker-0.108.0-Linux-x86_64/scywalker -v 1 -d 8 \
    -refdir g17 \
    -sc_expectedcells 183 \
    -cellmarkerfile markers_chr17.tsv \
    -threads 6 \
    test10x

The list of err logs for the 2nd run:

distrreg-samples__mix1__gene_counts-isoquant_sc-sminimap2_splice-mix1.tsv.err
distrreg-samples__mix1__isoform_counts-isoquant_sc-sminimap2_splice-mix1.tsv.err
distrreg-samples__mix1__read_assignments-isoquant_sc-sminimap2_splice-mix1.tsv.zst.err
distrreg-samples__mix2__gene_counts-isoquant_sc-sminimap2_splice-mix2.tsv.err
distrreg-samples__mix2__isoform_counts-isoquant_sc-sminimap2_splice-mix2.tsv.err
distrreg-samples__mix2__read_assignments-isoquant_sc-sminimap2_splice-mix2.tsv.zst.err
iso_compar-gene_counts-isoquant_sc-test10x.err
iso_compar-gene_counts-test10x.err
iso_compar-isoform_counts-isoquant_sc-test10x.err
iso_compar-isoform_counts-test10x.err
iso_compar-totalcounts-isoquant_sc-test10x.err
iso_compar-totalcounts-test10x.err
isoquant-isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.err
isoquant-isoquant_sc-sminimap2_splice-mix1-chr17-55615535-62204609.err
isoquant-isoquant_sc-sminimap2_splice-mix1-chr17-71399270-83257441.err
isoquant-isoquant_sc-sminimap2_splice-mix2-chr17-0-11039388.err
isoquant-isoquant_sc-sminimap2_splice-mix2-chr17-14573600-21969268.err
isoquant-isoquant_sc-sminimap2_splice-mix2-chr17-24840903-52275265.err
isoquant-isoquant_sc-sminimap2_splice-mix2-chr17-62204609-65342621.err
isoquant-isoquant_sc-sminimap2_splice-mix2-chr17-71399270-83257441.err
isoquant_join-isoquant_sc-sminimap2_splice-mix1.err
isoquant_join-isoquant_sc-sminimap2_splice-mix2.err
isoquant_sc_join_gene-isoquant_sc-sminimap2_splice-mix1.err
isoquant_sc_join_gene-isoquant_sc-sminimap2_splice-mix2.err
isoquant_sc_join_iso-isoquant_sc-sminimap2_splice-mix1.err
isoquant_sc_join_iso-isoquant_sc-sminimap2_splice-mix2.err
isquant_convert-isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.err
isquant_convert-isoquant_sc-sminimap2_splice-mix1-chr17-55615535-62204609.err
isquant_convert-isoquant_sc-sminimap2_splice-mix1-chr17-71399270-83257441.err
isquant_convert-isoquant_sc-sminimap2_splice-mix2-chr17-0-11039388.err
isquant_convert-isoquant_sc-sminimap2_splice-mix2-chr17-14573600-21969268.err
isquant_convert-isoquant_sc-sminimap2_splice-mix2-chr17-24840903-52275265.err
isquant_convert-isoquant_sc-sminimap2_splice-mix2-chr17-62204609-65342621.err
isquant_convert-isoquant_sc-sminimap2_splice-mix2-chr17-71399270-83257441.err
isquant_sc_count-chr17-0-11039388-isoquant_sc-sminimap2_splice-mix1.err
isquant_sc_count-chr17-0-11039388-isoquant_sc-sminimap2_splice-mix2.err
isquant_sc_count-chr17-11039388-14573600-isoquant_sc-sminimap2_splice-mix1.err
isquant_sc_count-chr17-11039388-14573600-isoquant_sc-sminimap2_splice-mix2.err
isquant_sc_count-chr17-14573600-21969268-isoquant_sc-sminimap2_splice-mix1.err
isquant_sc_count-chr17-14573600-21969268-isoquant_sc-sminimap2_splice-mix2.err
isquant_sc_count-chr17-21969268-24840903-isoquant_sc-sminimap2_splice-mix1.err
isquant_sc_count-chr17-21969268-24840903-isoquant_sc-sminimap2_splice-mix2.err
isquant_sc_count-chr17-24840903-52275265-isoquant_sc-sminimap2_splice-mix1.err
isquant_sc_count-chr17-24840903-52275265-isoquant_sc-sminimap2_splice-mix2.err
isquant_sc_count-chr17-52275265-55615535-isoquant_sc-sminimap2_splice-mix1.err
isquant_sc_count-chr17-52275265-55615535-isoquant_sc-sminimap2_splice-mix2.err
isquant_sc_count-chr17-55615535-62204609-isoquant_sc-sminimap2_splice-mix1.err
isquant_sc_count-chr17-55615535-62204609-isoquant_sc-sminimap2_splice-mix2.err
isquant_sc_count-chr17-62204609-65342621-isoquant_sc-sminimap2_splice-mix1.err
isquant_sc_count-chr17-62204609-65342621-isoquant_sc-sminimap2_splice-mix2.err
isquant_sc_count-chr17-65342621-70573759-isoquant_sc-sminimap2_splice-mix1.err
isquant_sc_count-chr17-65342621-70573759-isoquant_sc-sminimap2_splice-mix2.err
isquant_sc_count-chr17-70573759-71399270-isoquant_sc-sminimap2_splice-mix1.err
isquant_sc_count-chr17-70573759-71399270-isoquant_sc-sminimap2_splice-mix2.err
isquant_sc_count-chr17-71399270-83257441-isoquant_sc-sminimap2_splice-mix1.err
isquant_sc_count-chr17-71399270-83257441-isoquant_sc-sminimap2_splice-mix2.err
pb_compar-gene_counts-scsorter-test10x.err
pb_compar-gene_counts-sctype-test10x.err
pb_compar-gene_counts-test10x.err
pb_compar-pb_gene_counts_colinfo-scsorter-test10x.err
pb_compar-pb_gene_counts_colinfo-sctype-test10x.err
pb_compar-pb_gene_counts_colinfo-test10x.err
pb_compar-pb_isoform_counts_colinfo-scsorter-test10x.err
pb_compar-pb_isoform_counts_colinfo-sctype-test10x.err
pb_compar-pb_isoform_counts_colinfo-test10x.err
pb_compar-pb_isoform_counts-scsorter-test10x.err
pb_compar-pb_isoform_counts-sctype-test10x.err
pb_compar-pb_isoform_counts-test10x.err
reportscombine_stats-test10x.err
sc_celltyper_scsorter-isoquant_sc-sminimap2_splice-mix1.err
sc_celltyper_scsorter-isoquant_sc-sminimap2_splice-mix2.err
sc_celltyper_scsorter_scgenefile10x-scsorter-isoquant_sc-sminimap2_splice-mix1.err
sc_celltyper_scsorter_scgenefile10x-scsorter-isoquant_sc-sminimap2_splice-mix2.err
sc_celltyper_sctype-isoquant_sc-sminimap2_splice-mix1.err
sc_celltyper_sctype-isoquant_sc-sminimap2_splice-mix2.err
sc_counts_filtered-isoquant_sc-sminimap2_splice-mix1.err
sc_counts_filtered-isoquant_sc-sminimap2_splice-mix2.err
sc_filter_default_r-isoquant_sc-sminimap2_splice-mix1.err
sc_filter_default_r-isoquant_sc-sminimap2_splice-mix2.err
sc_filter_default_scgenefile10x-isoquant_sc-sminimap2_splice-mix1.err
sc_filter_default_scgenefile10x-isoquant_sc-sminimap2_splice-mix2.err
sc_filter_default_scisoformfile10x-isoquant_sc-sminimap2_splice-mix1.err
sc_filter_default_scisoformfile10x-isoquant_sc-sminimap2_splice-mix2.err
sc_filter_write_cellinfo-isoquant_sc-sminimap2_splice-mix1.err
sc_filter_write_cellinfo-isoquant_sc-sminimap2_splice-mix2.err
sc_pseudobulk-scsorter-isoquant_sc-sminimap2_splice-mix1.err
sc_pseudobulk-scsorter-isoquant_sc-sminimap2_splice-mix2.err
sc_pseudobulk-sctype-isoquant_sc-sminimap2_splice-mix1.err
sc_pseudobulk-sctype-isoquant_sc-sminimap2_splice-mix2.err
scywalker_report_mix1.err
scywalker_report_mix2.err

An example error log:

$ cat test10x/log_jobs/err/pb_compar-gene_counts-scsorter-test10x.err
dependency /home/nickyoungblut/bin/scywalker-0.108.0-Linux-x86_64/test_run/scywalker_test/test10x/samples/mix1/pb_gene_counts-scsorter-isoquant_sc-sminimap2_splice-mix1.tsv.zst not found
    while executing
"error "dependency /home/nickyoungblut/bin/scywalker-0.108.0-Linux-x86_64/test_run/scywalker_test/test10x/samples/mix1/pb_gene_counts-scsorter-isoquant..."
    invoked from within
"if {![job_file_or_link_exists /home/nickyoungblut/bin/scywalker-0.108.0-Linux-x86_64/test_run/scywalker_test/test10x/samples/mix1/pb_gene_counts-scsor..."
    (file "/home/nickyoungblut/bin/scywalker-0.108.0-Linux-x86_64/test_run/scywalker_test/test10x/log_jobs/run/pb_compar-gene_counts-scsorter-test10x.run" line 17)
    invoked from within
"source /home/nickyoungblut/bin/scywalker-0.108.0-Linux-x86_64/test_run/scywalker_test/test10x/log_jobs/run/pb_compar-gene_counts-scsorter-test10x.run"
    ("uplevel" body line 1)
    invoked from within
"uplevel #0 [list source $file]"
    (procedure "cg_source" line 4)
    invoked from within
"cg_$action {*}$args"
    invoked from within
cg source /home/nickyoungblut/bin/scywalker-0.108.0-Linux-x86_64/test_run/scywalker_test/test10x/log_jobs/run/pb_compar-gene_counts-scsorter-test10x.run

My system info:

wdecoster commented 7 months ago

Thank you for reporting this. @derijkp is off for a few days, but will get back to you upon his return.

Best regards, Wouter

derijkp commented 7 months ago

Hi,

Thank you for your error report. Given the number of errors, likely something something went wrong in an early job (e.g. making the bam file), causing subsequent jobs dependent on the results to fail as well. The included example err file is from such a subsequent job (missing a file it needs that should have been made by an earlier job) To identify the original error, can you have a look at (or send) the (overview) logfile, which should be named process_project_test10x.*.error. This is a tab-separated list of jobs (in order of dependencies) with status, messages, etc; we are looking for the first job with error status in this list.

I will add a section on troubleshooting to the docs (and in messages) to make this easier/clearer.

Peter

nick-youngblut commented 7 months ago

Here's the error log: process_project_test10x.2024-03-14_15-41-11-239.error.zip

It appears that the errors start at the isoquant-isoquant_sc-sminimap2_splice jobs.

derijkp commented 7 months ago

The error message for this one is unfortunately not very informative (child process exited abnormally)

Can you also provide the run file: log_jobs/run/isoquant-isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.run and the bam file so I can check what might be wrong with them.

Can you also try running that run file directly to check if the error is reproducible on your machine: ./log_jobs/run/isoquant-isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.run

nick-youngblut commented 7 months ago

Here the *.run file (why not the standard *.sh file extension?): isoquant-isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.run.zip

I'm running the job now; I'll let you know if it finishes successfully.

I'm assuming that such tests do not occur on your machine. It might be best to reproduce my machine via docker. Something like:

FROM ubuntu:20.04

RUN apt-get update \
    && apt-get upgrade -y \
    && apt-get install -y \
      wget \
      build-essential \
      wget \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

RUN wget https://github.com/derijkp/scywalker/releases/download/v0.108.0/scywalker_test.tar.gz \
  && tar xvzf scywalker_test.tar.gz \
  && rm -f scywalker_test.tar.gz

CMD ["scywalker", "-h"]

Moreover, such software testing errors could be identified via GitHub Actions. An example actions yaml file:

on:
  push:
    branches: [main, master]
  pull_request:
    branches: [main, master]

name: scywalker

jobs:
  test:
    runs-on: ${{ matrix.config.os }}

    name: ${{ matrix.config.os }}

    strategy:
      fail-fast: false
      matrix:
        config:
          - {os: "ubuntu-20.04"}
          - {os: "ubuntu-22.04"}

    steps:
      - id: "Checkout"
        name: "Checkout code"
        uses: actions/checkout@v3
      - id: "Install"
        name: "scywalker install"
        run: |
          wget https://github.com/derijkp/scywalker/releases/download/v0.108.0/scywalker_test.tar.gz
          tar xvzf scywalker_test.tar.gz
          rm -f scywalker_test.tar.gz
       - id: "Test download"
          name: "Download scywalker test data"
          run: |
            wget https://github.com/derijkp/scywalker/releases/download/v0.108.0/scywalker_test.tar.gz
            tar xvzf scywalker_test.tar.gz
       - id: "Test create ref"
         name: "Create scywalker ref"
         run: |
             cd scywalker_test
             ../scywalker-0.108.0-Linux-x86_64/scywalker_makerefdir -organelles '' g17 genome.fa genes.gtf
       - id: "Test run"
         name: "Run scywalker on test"
         run: |
           cd scywalker_test
           ../scywalker-0.108.0-Linux-x86_64/scywalker -v 1 -d 8  -refdir g17 -sc_expectedcells 183 -cellmarkerfile markers_chr17.tsv -threads 2 test10x

Using the docker image and/or Github actions can save a lot of time with debugging issues.

nick-youngblut commented 7 months ago

It appears that the isoquant-isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.run job finished successfully and without warnings (except for over-writting):

2024-03-25 16:41:33,934 - INFO - Running IsoQuant version 3.3.0
2024-03-25 16:41:33,934 - WARNING - Output folder already exists, some files may be overwritten.
2024-03-25 16:41:33,939 - INFO - Novel unspliced transcripts will not be reported, set --report_novel_unspliced true to discover them
2024-03-25 16:41:33,939 - INFO -  === IsoQuant pipeline started ===
2024-03-25 16:41:33,939 - INFO - Loading gene database from /tmp/tempExtral.715008-WdiBEZnMHedNDOAfcxPb/_Extral_temp_6.tmp.db
2024-03-25 16:41:33,939 - INFO - Loading reference genome from /home/nickyoungblut/bin/scywalker-0.108.0-Linux-x86_64/test_run/scywalker_test/g17/genome_g17.ifas
2024-03-25 16:41:33,940 - INFO - Processing 1 sample
2024-03-25 16:41:33,940 - INFO - Processing sample OUT
2024-03-25 16:41:33,940 - INFO - Sample has 1 BAM file: /tmp/tempExtral.715008-WdiBEZnMHedNDOAfcxPb/_Extral_temp_1.tmp.bam
2024-03-25 16:41:33,940 - INFO - Collecting read alignments
2024-03-25 16:41:33,941 - INFO - Processing chromosome chr17
2024-03-25 16:53:10,796 - INFO - Finished processing chromosome chr17
2024-03-25 16:53:11,156 - INFO - Resolving multimappers
2024-03-25 16:53:11,553 - INFO - Finishing read assignment, total assignments 241963, polyA percentage 84.3
2024-03-25 16:53:11,671 - INFO - Read assignments files saved to /home/nickyoungblut/bin/scywalker-0.108.0-Linux-x86_64/test_run/scywalker_test/test10x/samples/mix1/isoquant_sc-sminimap2_splice-mix1.temp/isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.temp/OUT/aux/OUT.save*.
2024-03-25 16:53:11,671 - INFO - Total alignments processed: 241963, polyA tail detected in 203949 (84.3%)
2024-03-25 16:53:11,671 - INFO - Processing assigned reads OUT
2024-03-25 16:53:11,672 - INFO - Processing chromosome chr17
2024-03-25 16:53:11,887 - INFO - Loading read assignments from /home/nickyoungblut/bin/scywalker-0.108.0-Linux-x86_64/test_run/scywalker_test/test10x/samples/mix1/isoquant_sc-sminimap2_splice-mix1.temp/isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.temp/OUT/aux/OUT.save_chr17
2024-03-25 16:58:02,264 - INFO - Finished processing chromosome chr17
2024-03-25 16:58:02,298 - INFO - Transcript model file /home/nickyoungblut/bin/scywalker-0.108.0-Linux-x86_64/test_run/scywalker_test/test10x/samples/mix1/isoquant_sc-sminimap2_splice-mix1.temp/isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.temp/OUT/OUT.transcript_models.gtf
2024-03-25 16:58:02,303 - INFO - Extended annotation is saved to /home/nickyoungblut/bin/scywalker-0.108.0-Linux-x86_64/test_run/scywalker_test/test10x/samples/mix1/isoquant_sc-sminimap2_splice-mix1.temp/isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.temp/OUT/OUT.extended_annotation.gtf
2024-03-25 16:58:02,303 - INFO - Transcript model statistics
2024-03-25 16:58:02,303 - INFO - known: 509
2024-03-25 16:58:02,303 - INFO - novel_in_catalog: 175
2024-03-25 16:58:02,303 - INFO - novel_not_in_catalog: 92
2024-03-25 16:58:02,417 - INFO - Gene counts are stored in /home/nickyoungblut/bin/scywalker-0.108.0-Linux-x86_64/test_run/scywalker_test/test10x/samples/mix1/isoquant_sc-sminimap2_splice-mix1.temp/isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.temp/OUT/OUT.gene_counts.tsv
2024-03-25 16:58:02,417 - INFO - Transcript counts are stored in /home/nickyoungblut/bin/scywalker-0.108.0-Linux-x86_64/test_run/scywalker_test/test10x/samples/mix1/isoquant_sc-sminimap2_splice-mix1.temp/isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.temp/OUT/OUT.transcript_counts.tsv
2024-03-25 16:58:02,417 - INFO - Read assignments are stored in /home/nickyoungblut/bin/scywalker-0.108.0-Linux-x86_64/test_run/scywalker_test/test10x/samples/mix1/isoquant_sc-sminimap2_splice-mix1.temp/isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.temp/OUT/OUT.read_assignments.tsv
2024-03-25 16:58:02,417 - INFO - Read assignment statistics
2024-03-25 16:58:02,417 - INFO - ambiguous: 15537
2024-03-25 16:58:02,417 - INFO - inconsistent: 69992
2024-03-25 16:58:02,417 - INFO - intergenic: 69
2024-03-25 16:58:02,417 - INFO - noninformative: 75206
2024-03-25 16:58:02,417 - INFO - unique: 75976
2024-03-25 16:58:02,417 - INFO - unique_minor_difference: 5183
2024-03-25 16:58:02,418 - INFO - Processed sample OUT
2024-03-25 16:58:02,418 - INFO - Processed 1 sample
2024-03-25 16:58:02,418 - INFO -  === IsoQuant pipeline finished ===
derijkp commented 7 months ago

These errors indeed did not occur on my test systems. Thanks for the suggestion of using docker for testing: I was not doing that, but will from now as it allows to both test more systems and automate the tests. I will also look into the github actions later.

Testing in the ubuntu docker did produce errors due to missing libraries, which I will fix in the upcoming new release (by including them in the appdir). I am not sure if this fix will solve the problems on your system, as the errors I encountered were different from what you reported. In the docker system, the demo did run successfully after installing packages providing the missing libraries (apt install -y curl libsm6 libgtk2.0-0 libegl1 libxt6 libxtst6 libxext6 libgl1-mesa-glx)

nick-youngblut commented 7 months ago

The beauty of Dockerized workflows in in their reproducibility. I don't want to start adding random os libraries to my VM (shared with other uses) just to run this pipeline. So, I'll create a Docker image for scywalker and test it out. In that regard, it would be helpful if you shared your final Dockerfile, in which the scywalker tests are working.

derijkp commented 7 months ago

I have added a working Dockerfile together with code to build and test in commit 488311e.

nick-youngblut commented 6 months ago

Thanks @derijkp for creating the dockerfile!

I tried to run the test workflow using the image that I created from the dockerfile:

docker run \
  --platform linux/amd64 \
  --rm -it \
  -v $(pwd):/data \
  -u $(id -u):$(id -g) \
  scywalker:0.108.0 \
  scywalker_makerefdir -organelles '' g17 /data/genome.fa /data/genes.gtf

...but I'm getting the following error:

can't create directory "/usr/local/bin/g17": permission denied
    while executing
"file mkdir $refdir"
    (procedure "scywalker_makerefdir" line 36)
    invoked from within
"scywalker_makerefdir {*}$argv"
    (file "/usr/local/bin/scywalker-0.108.0-Linux-x86_64/apps/scywalker_makerefdir/scywalker_makerefdir.tcl" line 212)

It appears that scywalker_makerefdir requires root access to run, which could create file permissions issues for users if they have to run the docker container as root.

Running the container as root does work:

docker run  \
  --platform linux/amd64 \
  --rm -it   \
  -v $(pwd):/data   ]
  scywalker:0.108.0  \
  scywalker_makerefdir -organelles '' g17 /data/genome.fa /data/genes.gtf

...but the ref is created in /usr/local/bin/g17/, so it is contained in the docker container and not accessible outside of the container. It would help if the user can specify all output to a specific directory (e.g., /data).

I had to run the following to both create and use the ref dir:

docker run \
  --platform linux/amd64 \
  --rm -it \
  -v $(pwd):/data \
  scywalker:0.108.0 \
  bash -c "scywalker_makerefdir \
    -organelles '' \
    g17 /data/genome.fa /data/genes.gtf \
    && scywalker -v 1 -d 8 \
      -refdir /usr/local/bin/g17 \
      -sc_expectedcells 183 \
      -cellmarkerfile /data/markers_chr17.tsv \
      -threads 6 \
      /data/test10x"

The pipeline does complete successfully:

[...]
2024-04-09 19:42:07.188 cleanup finished
2024-04-09 19:42:07.188 all jobs finished

process_project_test10x.2024-04-09_19-15-38-262.finished.zip

derijkp commented 6 months ago

The container does not need to run as root if you either specify the full path (/data/g17) or, easier (imo), use the -workdir argument as I did in the example code:

docker run -v `pwd`:/data --workdir /data -u `id -u`:`id -g` scywalker:$version \
    scywalker_makerefdir -organelles '' g17 genome.fa genes.gtf
docker run -v `pwd`:/data --workdir /data -u `id -u`:`id -g` scywalker:$version \
    scywalker -v 1 -d 8 \
        -refdir g17 \
        -sc_expectedcells 183 \
        -cellmarkerfile markers_chr17.tsv \
        -threads 6 \
        test10x
nick-youngblut commented 6 months ago

It appears that you did not wrap your example code in triple backticks, so the code formatting is unclear, but I get what you are going for. I can confirm that your commands work so that root is not needed.

I recommend using --rm with docker run, since it is good practice and can avoid issues.

wdecoster commented 6 months ago

Added code formatting :-)

derijkp commented 5 months ago

Closing this thread because the new release 0.109.0 solves (most) issues in it: