Closed nick-youngblut closed 5 months ago
Thank you for reporting this. @derijkp is off for a few days, but will get back to you upon his return.
Best regards, Wouter
Hi,
Thank you for your error report. Given the number of errors, likely something something went wrong in an early job (e.g. making the bam file), causing subsequent jobs dependent on the results to fail as well. The included example err file is from such a subsequent job (missing a file it needs that should have been made by an earlier job) To identify the original error, can you have a look at (or send) the (overview) logfile, which should be named process_project_test10x.*.error. This is a tab-separated list of jobs (in order of dependencies) with status, messages, etc; we are looking for the first job with error status in this list.
I will add a section on troubleshooting to the docs (and in messages) to make this easier/clearer.
Peter
Here's the error log: process_project_test10x.2024-03-14_15-41-11-239.error.zip
It appears that the errors start at the isoquant-isoquant_sc-sminimap2_splice
jobs.
The error message for this one is unfortunately not very informative (child process exited abnormally)
Can you also provide the run file: log_jobs/run/isoquant-isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.run and the bam file so I can check what might be wrong with them.
Can you also try running that run file directly to check if the error is reproducible on your machine: ./log_jobs/run/isoquant-isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.run
Here the *.run
file (why not the standard *.sh
file extension?): isoquant-isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.run.zip
I'm running the job now; I'll let you know if it finishes successfully.
I'm assuming that such tests do not occur on your machine. It might be best to reproduce my machine via docker. Something like:
FROM ubuntu:20.04
RUN apt-get update \
&& apt-get upgrade -y \
&& apt-get install -y \
wget \
build-essential \
wget \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN wget https://github.com/derijkp/scywalker/releases/download/v0.108.0/scywalker_test.tar.gz \
&& tar xvzf scywalker_test.tar.gz \
&& rm -f scywalker_test.tar.gz
CMD ["scywalker", "-h"]
Moreover, such software testing errors could be identified via GitHub Actions. An example actions yaml file:
on:
push:
branches: [main, master]
pull_request:
branches: [main, master]
name: scywalker
jobs:
test:
runs-on: ${{ matrix.config.os }}
name: ${{ matrix.config.os }}
strategy:
fail-fast: false
matrix:
config:
- {os: "ubuntu-20.04"}
- {os: "ubuntu-22.04"}
steps:
- id: "Checkout"
name: "Checkout code"
uses: actions/checkout@v3
- id: "Install"
name: "scywalker install"
run: |
wget https://github.com/derijkp/scywalker/releases/download/v0.108.0/scywalker_test.tar.gz
tar xvzf scywalker_test.tar.gz
rm -f scywalker_test.tar.gz
- id: "Test download"
name: "Download scywalker test data"
run: |
wget https://github.com/derijkp/scywalker/releases/download/v0.108.0/scywalker_test.tar.gz
tar xvzf scywalker_test.tar.gz
- id: "Test create ref"
name: "Create scywalker ref"
run: |
cd scywalker_test
../scywalker-0.108.0-Linux-x86_64/scywalker_makerefdir -organelles '' g17 genome.fa genes.gtf
- id: "Test run"
name: "Run scywalker on test"
run: |
cd scywalker_test
../scywalker-0.108.0-Linux-x86_64/scywalker -v 1 -d 8 -refdir g17 -sc_expectedcells 183 -cellmarkerfile markers_chr17.tsv -threads 2 test10x
Using the docker image and/or Github actions can save a lot of time with debugging issues.
It appears that the isoquant-isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.run
job finished successfully and without warnings (except for over-writting):
2024-03-25 16:41:33,934 - INFO - Running IsoQuant version 3.3.0
2024-03-25 16:41:33,934 - WARNING - Output folder already exists, some files may be overwritten.
2024-03-25 16:41:33,939 - INFO - Novel unspliced transcripts will not be reported, set --report_novel_unspliced true to discover them
2024-03-25 16:41:33,939 - INFO - === IsoQuant pipeline started ===
2024-03-25 16:41:33,939 - INFO - Loading gene database from /tmp/tempExtral.715008-WdiBEZnMHedNDOAfcxPb/_Extral_temp_6.tmp.db
2024-03-25 16:41:33,939 - INFO - Loading reference genome from /home/nickyoungblut/bin/scywalker-0.108.0-Linux-x86_64/test_run/scywalker_test/g17/genome_g17.ifas
2024-03-25 16:41:33,940 - INFO - Processing 1 sample
2024-03-25 16:41:33,940 - INFO - Processing sample OUT
2024-03-25 16:41:33,940 - INFO - Sample has 1 BAM file: /tmp/tempExtral.715008-WdiBEZnMHedNDOAfcxPb/_Extral_temp_1.tmp.bam
2024-03-25 16:41:33,940 - INFO - Collecting read alignments
2024-03-25 16:41:33,941 - INFO - Processing chromosome chr17
2024-03-25 16:53:10,796 - INFO - Finished processing chromosome chr17
2024-03-25 16:53:11,156 - INFO - Resolving multimappers
2024-03-25 16:53:11,553 - INFO - Finishing read assignment, total assignments 241963, polyA percentage 84.3
2024-03-25 16:53:11,671 - INFO - Read assignments files saved to /home/nickyoungblut/bin/scywalker-0.108.0-Linux-x86_64/test_run/scywalker_test/test10x/samples/mix1/isoquant_sc-sminimap2_splice-mix1.temp/isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.temp/OUT/aux/OUT.save*.
2024-03-25 16:53:11,671 - INFO - Total alignments processed: 241963, polyA tail detected in 203949 (84.3%)
2024-03-25 16:53:11,671 - INFO - Processing assigned reads OUT
2024-03-25 16:53:11,672 - INFO - Processing chromosome chr17
2024-03-25 16:53:11,887 - INFO - Loading read assignments from /home/nickyoungblut/bin/scywalker-0.108.0-Linux-x86_64/test_run/scywalker_test/test10x/samples/mix1/isoquant_sc-sminimap2_splice-mix1.temp/isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.temp/OUT/aux/OUT.save_chr17
2024-03-25 16:58:02,264 - INFO - Finished processing chromosome chr17
2024-03-25 16:58:02,298 - INFO - Transcript model file /home/nickyoungblut/bin/scywalker-0.108.0-Linux-x86_64/test_run/scywalker_test/test10x/samples/mix1/isoquant_sc-sminimap2_splice-mix1.temp/isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.temp/OUT/OUT.transcript_models.gtf
2024-03-25 16:58:02,303 - INFO - Extended annotation is saved to /home/nickyoungblut/bin/scywalker-0.108.0-Linux-x86_64/test_run/scywalker_test/test10x/samples/mix1/isoquant_sc-sminimap2_splice-mix1.temp/isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.temp/OUT/OUT.extended_annotation.gtf
2024-03-25 16:58:02,303 - INFO - Transcript model statistics
2024-03-25 16:58:02,303 - INFO - known: 509
2024-03-25 16:58:02,303 - INFO - novel_in_catalog: 175
2024-03-25 16:58:02,303 - INFO - novel_not_in_catalog: 92
2024-03-25 16:58:02,417 - INFO - Gene counts are stored in /home/nickyoungblut/bin/scywalker-0.108.0-Linux-x86_64/test_run/scywalker_test/test10x/samples/mix1/isoquant_sc-sminimap2_splice-mix1.temp/isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.temp/OUT/OUT.gene_counts.tsv
2024-03-25 16:58:02,417 - INFO - Transcript counts are stored in /home/nickyoungblut/bin/scywalker-0.108.0-Linux-x86_64/test_run/scywalker_test/test10x/samples/mix1/isoquant_sc-sminimap2_splice-mix1.temp/isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.temp/OUT/OUT.transcript_counts.tsv
2024-03-25 16:58:02,417 - INFO - Read assignments are stored in /home/nickyoungblut/bin/scywalker-0.108.0-Linux-x86_64/test_run/scywalker_test/test10x/samples/mix1/isoquant_sc-sminimap2_splice-mix1.temp/isoquant_sc-sminimap2_splice-mix1-chr17-24840903-52275265.temp/OUT/OUT.read_assignments.tsv
2024-03-25 16:58:02,417 - INFO - Read assignment statistics
2024-03-25 16:58:02,417 - INFO - ambiguous: 15537
2024-03-25 16:58:02,417 - INFO - inconsistent: 69992
2024-03-25 16:58:02,417 - INFO - intergenic: 69
2024-03-25 16:58:02,417 - INFO - noninformative: 75206
2024-03-25 16:58:02,417 - INFO - unique: 75976
2024-03-25 16:58:02,417 - INFO - unique_minor_difference: 5183
2024-03-25 16:58:02,418 - INFO - Processed sample OUT
2024-03-25 16:58:02,418 - INFO - Processed 1 sample
2024-03-25 16:58:02,418 - INFO - === IsoQuant pipeline finished ===
These errors indeed did not occur on my test systems. Thanks for the suggestion of using docker for testing: I was not doing that, but will from now as it allows to both test more systems and automate the tests. I will also look into the github actions later.
Testing in the ubuntu docker did produce errors due to missing libraries, which I will fix in the upcoming new release (by including them in the appdir). I am not sure if this fix will solve the problems on your system, as the errors I encountered were different from what you reported. In the docker system, the demo did run successfully after installing packages providing the missing libraries (apt install -y curl libsm6 libgtk2.0-0 libegl1 libxt6 libxtst6 libxext6 libgl1-mesa-glx)
The beauty of Dockerized workflows in in their reproducibility. I don't want to start adding random os libraries to my VM (shared with other uses) just to run this pipeline. So, I'll create a Docker image for scywalker and test it out. In that regard, it would be helpful if you shared your final Dockerfile, in which the scywalker tests are working.
I have added a working Dockerfile together with code to build and test in commit 488311e.
Thanks @derijkp for creating the dockerfile!
I tried to run the test workflow using the image that I created from the dockerfile:
docker run \
--platform linux/amd64 \
--rm -it \
-v $(pwd):/data \
-u $(id -u):$(id -g) \
scywalker:0.108.0 \
scywalker_makerefdir -organelles '' g17 /data/genome.fa /data/genes.gtf
...but I'm getting the following error:
can't create directory "/usr/local/bin/g17": permission denied
while executing
"file mkdir $refdir"
(procedure "scywalker_makerefdir" line 36)
invoked from within
"scywalker_makerefdir {*}$argv"
(file "/usr/local/bin/scywalker-0.108.0-Linux-x86_64/apps/scywalker_makerefdir/scywalker_makerefdir.tcl" line 212)
It appears that scywalker_makerefdir
requires root access to run, which could create file permissions issues for users if they have to run the docker container as root.
Running the container as root does work:
docker run \
--platform linux/amd64 \
--rm -it \
-v $(pwd):/data ]
scywalker:0.108.0 \
scywalker_makerefdir -organelles '' g17 /data/genome.fa /data/genes.gtf
...but the ref is created in /usr/local/bin/g17/
, so it is contained in the docker container and not accessible outside of the container. It would help if the user can specify all output to a specific directory (e.g., /data
).
I had to run the following to both create and use the ref dir:
docker run \
--platform linux/amd64 \
--rm -it \
-v $(pwd):/data \
scywalker:0.108.0 \
bash -c "scywalker_makerefdir \
-organelles '' \
g17 /data/genome.fa /data/genes.gtf \
&& scywalker -v 1 -d 8 \
-refdir /usr/local/bin/g17 \
-sc_expectedcells 183 \
-cellmarkerfile /data/markers_chr17.tsv \
-threads 6 \
/data/test10x"
The pipeline does complete successfully:
[...]
2024-04-09 19:42:07.188 cleanup finished
2024-04-09 19:42:07.188 all jobs finished
process_project_test10x.2024-04-09_19-15-38-262.finished.zip
The container does not need to run as root if you either specify the full path (/data/g17) or, easier (imo), use the -workdir argument as I did in the example code:
docker run -v `pwd`:/data --workdir /data -u `id -u`:`id -g` scywalker:$version \
scywalker_makerefdir -organelles '' g17 genome.fa genes.gtf
docker run -v `pwd`:/data --workdir /data -u `id -u`:`id -g` scywalker:$version \
scywalker -v 1 -d 8 \
-refdir g17 \
-sc_expectedcells 183 \
-cellmarkerfile markers_chr17.tsv \
-threads 6 \
test10x
It appears that you did not wrap your example code in triple backticks, so the code formatting is unclear, but I get what you are going for. I can confirm that your commands work so that root is not needed.
I recommend using --rm
with docker run
, since it is good practice and can avoid issues.
Added code formatting :-)
Closing this thread because the new release 0.109.0 solves (most) issues in it:
I've run the test dataset twice:
Analysis ended with 82 errors
Analysis ended with 94 errors
Reprex:
The list of
err
logs for the 2nd run:An example error log:
My system info: