BIMSBbioinfo / pigx_rnaseq

Bulk RNA-seq Data Processing, Quality Control, and Downstream Analysis Pipeline
GNU General Public License v3.0
21 stars 12 forks source link

test.sh on trimmed_reads misses file after successful execution #78

Closed smoe closed 4 years ago

smoe commented 4 years ago

Hello,

I tried salmon 1.2 and now also 1.3, but when running your tests salmon fails on me:

#/usr/bin/make -j2 check VERBOSE=1
./test.sh
Commencing snakemake run submission locally
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       all
        1       collate_read_counts
        6       count_reads
        1       counts_from_SALMON
        6       fastqc
        6       genomeCoverage
        6       index_bam
        1       multiqc
        1       norm_counts_deseq
        1       report1
        1       report2
        1       report3
        1       salmon_index
        6       salmon_quant
        6       sort_bam
        1       star_index
        6       star_map
        1       translate_sample_sheet_for_report
        4       trim_galore_pe
        2       trim_galore_se
        59

[Fri Jul 31 17:54:24 2020]
rule salmon_index:
    input: /home/moeller/pigx-rnaseq/pigx-rnaseq/tests/sample_data/sample.cdna.fasta
    output: /home/moeller/pigx-rnaseq/pigx-rnaseq/tests/output/salmon_index/sa.bin
    log: /home/moeller/pigx-rnaseq/pigx-rnaseq/tests/output/logs/salmon_index.log
    jobid: 2

/usr/bin/salmon  index -t /home/moeller/pigx-rnaseq/pigx-rnaseq/tests/sample_data/sample.cdna.fasta -i /home/moeller/pigx-rnaseq/pigx-rnaseq/tests/output/salmon_index -p 8 >> /home/moeller/pigx-rnaseq/pigx-rnaseq/tests/output/logs/salmon_index.log 2>&1
Waiting at most 5 seconds for missing files.
MissingOutputException in line 320 of /home/moeller/pigx-rnaseq/pigx-rnaseq/pigx_rnaseq.py:
Job completed successfully, but some output files are missing. Missing files after 5 seconds:
/home/moeller/pigx-rnaseq/pigx-rnaseq/tests/output/salmon_index/sa.bin
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
  File "/usr/lib/python3/dist-packages/snakemake/executors/__init__.py", line 544, in handle_job_success
  File "/usr/lib/python3/dist-packages/snakemake/executors/__init__.py", line 225, in handle_job_success
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/moeller/pigx-rnaseq/pigx-rnaseq/tests/output/.snakemake/log/2020-07-31T175423.936343.snakemake.log
ERROR: could not find report for SALMON at transcript level
make[1]: *** [debian/rules:21: override_dh_auto_test] Fehler 1

I set the "jobs: 1" in tests/settings.yaml to avoid the extra complexity in my report. The salmon logfile is a bit weird in that it only mentions step 1 or 4:

$ cat /home/moeller/pigx-rnaseq/pigx-rnaseq/tests/output/logs/salmon_index.log
[2020-07-31 17:56:47.683] [jLog] [warning] The salmon index is being built without any decoy sequences.  It is recommended that decoy sequence (either computed auxiliary decoy sequence or the genome of the organism) be provided during indexing. Further details can be found at https://salmon.readthedocs.io/en/latest/salmon.html#preparing-transcriptome-indices-mapping-based-mode.
[2020-07-31 17:56:47.683] [jLog] [info] building index
out : /home/moeller/pigx-rnaseq/pigx-rnaseq/tests/output/salmon_index
[2020-07-31 17:56:47.683] [puff::index::jointLog] [info] Running fixFasta

[Step 1 of 4] : counting k-mers

[2020-07-31 17:56:47.990] [puff::index::jointLog] [warning] Removed 1 transcripts that were sequence duplicates of indexed transcripts.
[2020-07-31 17:56:47.990] [puff::index::jointLog] [warning] If you wish to retain duplicate transcripts, please use the `--keepDuplicates` flag
[2020-07-31 17:56:47.990] [puff::index::jointLog] [info] Replaced 0 non-ATCG nucleotides
[2020-07-31 17:56:47.990] [puff::index::jointLog] [info] Clipped poly-A tails from 17 transcripts
wrote 3654 cleaned references
[2020-07-31 17:56:48.015] [puff::index::jointLog] [info] Filter size not provided; estimating from number of distinct k-mers
[2020-07-31 17:56:48.128] [puff::index::jointLog] [info] ntHll estimated 2236392 distinct k-mers, setting filter size to 2^26
Threads = 8
Vertex length = 31
Hash functions = 5
Filter size = 67108864
Capacity = 2
Files: 
/home/moeller/pigx-rnaseq/pigx-rnaseq/tests/output/salmon_index/ref_k31_fixed.fa
--------------------------------------------------------------------------------
Round 0, 0:67108864
Pass    Filling Filtering
1       0       0
2       1       0
True junctions count = 12090
False junctions count = 7469
Hash table size = 19559
Candidate marks count = 84180
--------------------------------------------------------------------------------
Reallocating bifurcations time: 0
True marks count: 71140
Edges construction time: 0
--------------------------------------------------------------------------------
Distinct junctions = 12090

allowedIn: 15
Max Junction ID: 14549
seen.size():116401 kmerInfo.size():14550
approximateContigTotalLength: 1606116
counters for complex kmers:
(prec>1 & succ>1)=372 | (succ>1 & isStart)=6 | (prec>1 & isEnd)=14 | (isStart & isEnd)=3
contig count: 17787 element count: 2811269 complex nodes: 395
# of ones in rank vector: 17786
[2020-07-31 17:56:50.225] [puff::index::jointLog] [info] Starting the Pufferfish indexing by reading the GFA binary file.
[2020-07-31 17:56:50.225] [puff::index::jointLog] [info] Setting the index/BinaryGfa directory /home/moeller/pigx-rnaseq/pigx-rnaseq/tests/output/salmon_index
size = 2811269
-----------------------------------------
| Loading contigs | Time = 552.52 us
-----------------------------------------
size = 2811269
-----------------------------------------
| Loading contig boundaries | Time = 259.58 us
-----------------------------------------
Number of ones: 17786
Number of ones per inventory item: 512
Inventory entries filled: 35
17786
[2020-07-31 17:56:50.238] [puff::index::jointLog] [info] Done wrapping the rank vector with a rank9sel structure.
[2020-07-31 17:56:50.239] [puff::index::jointLog] [info] contig count for validation: 17,786
[2020-07-31 17:56:50.248] [puff::index::jointLog] [info] Total # of Contigs : 17,786
[2020-07-31 17:56:50.248] [puff::index::jointLog] [info] Total # of numerical Contigs : 17,786
[2020-07-31 17:56:50.249] [puff::index::jointLog] [info] Total # of contig vec entries: 68,986
[2020-07-31 17:56:50.249] [puff::index::jointLog] [info] bits per offset entry 17
[2020-07-31 17:56:50.251] [puff::index::jointLog] [info] Done constructing the contig vector. 17787
[2020-07-31 17:56:50.259] [puff::index::jointLog] [info] # segments = 17,786
[2020-07-31 17:56:50.259] [puff::index::jointLog] [info] total length = 2,811,269
[2020-07-31 17:56:50.260] [puff::index::jointLog] [info] Reading the reference files ...
[2020-07-31 17:56:50.294] [puff::index::jointLog] [info] positional integer width = 22
[2020-07-31 17:56:50.294] [puff::index::jointLog] [info] seqSize = 2,811,269
[2020-07-31 17:56:50.294] [puff::index::jointLog] [info] rankSize = 2,811,269
[2020-07-31 17:56:50.294] [puff::index::jointLog] [info] edgeVecSize = 0
[2020-07-31 17:56:50.294] [puff::index::jointLog] [info] num keys = 2,277,689
[Building BooPHF]  100  %   elapsed:   0 min 0  sec   remaining:   0 min 0  sec
[2020-07-31 17:56:50.448] [puff::index::jointLog] [info] mphf size = 1.42339 MB
[2020-07-31 17:56:50.449] [puff::index::jointLog] [info] chunk size = 351,409
[2020-07-31 17:56:50.449] [puff::index::jointLog] [info] chunk 0 = [0, 351,426)
[2020-07-31 17:56:50.449] [puff::index::jointLog] [info] chunk 1 = [351,426, 702,835)
[2020-07-31 17:56:50.449] [puff::index::jointLog] [info] chunk 2 = [702,835, 1,054,244)
[2020-07-31 17:56:50.449] [puff::index::jointLog] [info] chunk 3 = [1,054,244, 1,405,674)
[2020-07-31 17:56:50.449] [puff::index::jointLog] [info] chunk 4 = [1,405,674, 1,757,083)
[2020-07-31 17:56:50.449] [puff::index::jointLog] [info] chunk 5 = [1,757,083, 2,108,492)
[2020-07-31 17:56:50.449] [puff::index::jointLog] [info] chunk 6 = [2,108,492, 2,459,901)
[2020-07-31 17:56:50.449] [puff::index::jointLog] [info] chunk 7 = [2,459,901, 2,811,239)
[2020-07-31 17:56:50.539] [puff::index::jointLog] [info] finished populating pos vector
[2020-07-31 17:56:50.539] [puff::index::jointLog] [info] writing index components
[2020-07-31 17:56:50.554] [puff::index::jointLog] [info] finished writing dense pufferfish index
[2020-07-31 17:56:50.556] [jLog] [info] done building index
for info, total work write each  : 2.331    total work inram from level 3 : 4.322  total work raw : 25.000 
Bitarray        11940288  bits (100.00 %)   (array + ranks )
final hash             0  bits (0.00 %) (nb in final hash 0)
[2020-07-31 17:56:55.854] [jLog] [warning] The salmon index is being built without any decoy sequences.  It is recommended that decoy sequence (either computed auxiliary decoy sequence or the genome of the organism) be provided during indexing. Further details can be found at https://salmon.readthedocs.io/en/latest/salmon.html#preparing-transcriptome-indices-mapping-based-mode.
[2020-07-31 17:56:55.855] [jLog] [info] building index
out : /home/moeller/pigx-rnaseq/pigx-rnaseq/tests/output/salmon_index
[2020-07-31 17:56:55.855] [puff::index::jointLog] [info] Running fixFasta

[Step 1 of 4] : counting k-mers

[2020-07-31 17:56:56.245] [puff::index::jointLog] [warning] Removed 1 transcripts that were sequence duplicates of indexed transcripts.
[2020-07-31 17:56:56.245] [puff::index::jointLog] [warning] If you wish to retain duplicate transcripts, please use the `--keepDuplicates` flag
[2020-07-31 17:56:56.246] [puff::index::jointLog] [info] Replaced 0 non-ATCG nucleotides
[2020-07-31 17:56:56.246] [puff::index::jointLog] [info] Clipped poly-A tails from 17 transcripts
wrote 3654 cleaned references
[2020-07-31 17:56:56.270] [puff::index::jointLog] [info] Filter size not provided; estimating from number of distinct k-mers
[2020-07-31 17:56:56.395] [puff::index::jointLog] [info] ntHll estimated 2236392 distinct k-mers, setting filter size to 2^26
Threads = 8
Vertex length = 31
Hash functions = 5
Filter size = 67108864
Capacity = 2
Files: 
/home/moeller/pigx-rnaseq/pigx-rnaseq/tests/output/salmon_index/ref_k31_fixed.fa
--------------------------------------------------------------------------------
Round 0, 0:67108864
Pass    Filling Filtering

The disk is not full.

I also executed the command manually but did not find a sa.bin file created. The salmon_index directory offers:

$ ls -l tests/output/salmon_index
insgesamt 10784
-rw-r--r-- 1 moeller moeller   14624 31. Jul 18:00 complete_ref_lens.bin
-rw-r--r-- 1 moeller moeller  650570 31. Jul 18:00 ctable.bin
-rw-r--r-- 1 moeller moeller   37832 31. Jul 18:00 ctg_offsets.bin
-rw-r--r-- 1 moeller moeller      57 31. Jul 18:00 duplicate_clusters.tsv
-rw-r--r-- 1 moeller moeller    1049 31. Jul 18:00 info.json
-rw-r--r-- 1 moeller moeller 1492972 31. Jul 18:00 mphf.bin
-rw-r--r-- 1 moeller moeller 6263680 31. Jul 18:00 pos.bin
-rw-r--r-- 1 moeller moeller     496 31. Jul 18:00 pre_indexing.log
-rw-r--r-- 1 moeller moeller  351448 31. Jul 18:00 rank.bin
-rw-r--r-- 1 moeller moeller   29240 31. Jul 18:00 refAccumLengths.bin
-rw-r--r-- 1 moeller moeller    7014 31. Jul 18:00 ref_indexing.log
-rw-r--r-- 1 moeller moeller   14624 31. Jul 18:00 reflengths.bin
-rw-r--r-- 1 moeller moeller 1441160 31. Jul 18:00 refseq.bin
-rw-r--r-- 1 moeller moeller  702856 31. Jul 18:00 seq.bin
-rw-r--r-- 1 moeller moeller     126 31. Jul 18:00 versionInfo.json

Any idea where I should look? salmon is the Debian package, not guix, admittedly. This failure is what blocks the Debian/Ubuntu package of pigx-rnaseq.

Many thanks for your help! Steffen

smoe commented 4 years ago

The error was not where I had indicated it to be.