Error in rule MergeBamAlignment - Read from mapped file is missing in reference fastq file!

dylkot commented 5 years ago

I'm getting the following error message at the MergeBamAlignment phase of the pipeline

[Thu Dec 27 18:07:58 2018]
rule MergeBamAlignment:
    input: /home/results/samples/RA0449.0/Aligned.out.bam, /home/results/samples/RA0449.0/trimmmed_repaired_R1.fastq.gz
    output: /home/results/samples/RA0449.0/Aligned.merged.bam
    jobid: 39
    wildcards: results_dir=/home/results, sample=RA0449.0

Activating conda environment: /home/dropSeqPipe/.snakemake/conda/840da6c6
Activating conda environment: /home/dropSeqPipe/.snakemake/conda/4b9c1953
Conda environment defines Python version < 3.5. Using Python of the master process to execute script. Note that this cannot be avoided, because the script uses data structures from Snakemake which are Python >=3.5 only.
Activating conda environment: /home/dropSeqPipe/.snakemake/conda/81acb004
[WARNING]         multiqc : MultiQC Version v1.7 now available!
[INFO   ]         multiqc : This is MultiQC v1.2 (48b0050)
[INFO   ]         multiqc : Template    : default
[INFO   ]         multiqc : Searching '/home/results/samples/RA0449.0'
[INFO   ]         multiqc : Only using modules star
Loading required package: viridisLite
Searching 12 files..  [####################################]  100%
[INFO   ]            star : Found 1 reports
[INFO   ]         multiqc : Compressing plot data
[INFO   ]         multiqc : Report      : ../results/reports/star.html
[INFO   ]         multiqc : Data        : ../results/reports/star_data
Warning: Ignoring unknown parameters: binwidth, bins, pad
[INFO   ]         multiqc : MultiQC complete
[Thu Dec 27 18:08:03 2018]
Finished job 8.
20 of 38 steps (53%) done
Warning: Ignoring unknown parameters: binwidth, bins, pad
pdf
  2
[Thu Dec 27 18:08:07 2018]
Finished job 9.
21 of 38 steps (55%) done
Read from mapped file is missing in reference fastq file!
[Thu Dec 27 18:09:34 2018]
Error in rule MergeBamAlignment:
    jobid: 39
    output: /home/results/samples/RA0449.0/Aligned.merged.bam
    conda-env: /home/dropSeqPipe/.snakemake/conda/4b9c1953

RuleException:
CalledProcessError in line 97 of /home/dropSeqPipe/rules/map.smk:
Command 'source activate /home/dropSeqPipe/.snakemake/conda/4b9c1953; set -euo pipefail;  python /home/dropSeqPipe/.snakemake/scripts/tmpw0q_o4n2.merge_bam.py ' returned non-zero exit status 1.
  File "/home/dropSeqPipe/rules/map.smk", line 97, in __rule_MergeBamAlignment
  File "/opt/miniconda3/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Removing output files of failed job MergeBamAlignment since they might be corrupted:
/home/results/samples/RA0449.0/Aligned.merged.bam
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/dropSeqPipe/.snakemake/log/2018-12-27T163217.727983.snakemake.log

Maybe I am missing something about how the pipeline is functioning but when I try to investigate /home/dropSeqPipe/.snakemake/scripts/tmpw0q_o4n2.merge_bam.py the file doesn't seem to be there. The input files:

/home/results/samples/RA0449.0/Aligned.out.bam, /home/results/samples/RA0449.0/trimmmed_repaired_R1.fastq.gz

seem normal as far as I can tell except that there are a fair number of very trimmed reads in Aligned.out.bam. We started out with 88BP libraries but we are having an adapter contamination issue where many of the reads are made up predominantly of SeqB_rc. I'm not sure if that could be relevant to the issue at all. Thanks!

Hoohm commented 5 years ago

This is odd. What's happening is that one read from R2 is missing in R1. This should not happen since bbmap is re-pairing reads from both R1 and R2 before mapping ensuring you only have the same reads in both files. Did you interact with those files in any manual way?

edit: Adding the read id to the exit message so that it is easier to debug. You can use the develop branch to get this info.

dylkot commented 5 years ago

Thanks for this! Now I have the helpful error message:

Read C7JT8:1:1102:13714:1859 from mapped file is missing in reference fastq file!

I'm checking it out and that read is definitely in both the input _R1.fastq.gz and _R2.fastq.gz files:

Read 1: @C7JT8:1:1102:13714:1859/1 GTGGTGGGTTACAGTGAGCT + CCCCCGGGGGGGGGGGGGGG

Read 2: @C7JT8:1:1102:13714:1859/2 GGCCAGGCTGGTCTCAAACTCCTGACCTCAGGCAATCCGCCCACCTTGGCCTCCCAAAGTGCTGAGGAACCCAGTTTGAAAACCATTC + CCCCCFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

It is also in aligned.out.bam:

C7JT8:1:1102:13714:1859 0 10 63052789 255 66M22S * 0 0 GGCCAGGCTGGTCTCAAACTCCTGACCTCAGGCAATCCGCCCACCTTGGCCTCCCAAAGTGCTGAGGAACCCAGTTTGAAAACCATTC CCCCCFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG NH:i:1 HI:i:1 AS:i:64 nM:i:0

and in trimmmed_repaired_R1.fastq.gz where it doesn't appear to have been trimmed any so it is identical to the read in the_R1.fastq.gz.

Any idea what could be going on? I would try to debug further myself but the relevant python file /home/dropSeqPipe/.snakemake/scripts/tmpw0q_o4n2.merge_bam.py seems to disapear.

Hoohm commented 5 years ago

I guess it comes from the /1 and /2 at the end of the read id. R1 is read by a fastq parser from SeqIO and the bam file is read by a pysam parser. I think pysam is not keeping the /2 at the end. This would explain the problem of not finding the read although it is there.

I tried a quick fix by deleting the end of the read id name in the branch feature/debugging_mergeBam.

Try it out and let me know if it works.

edit: The tmp scripts are deleted by snakemake. That is normal, don't worry.

dylkot commented 5 years ago

That seems to have fixed that issue! But now it is crashing at the repair_barcodes step with a very non-descriptive error message:

Building DAG of jobs...
Creating conda environment https:/bitbucket.org/snakemake/snakemake-wrappers/raw/0.27.1/bio/fastqc/environment.yaml...
Downloading remote packages.
Environment for ../../tmp/tmp1bgk4k5i.yaml created (location: .snakemake/conda/73b3d757)
Creating conda environment envs/plots_ext.yaml...
Downloading remote packages.
Environment for envs/plots_ext.yaml created (location: .snakemake/conda/1290ea5a)
Creating conda environment envs/cutadapt.yaml...
Downloading remote packages.
Environment for envs/cutadapt.yaml created (location: .snakemake/conda/7dc41205)
Creating conda environment envs/star.yaml...
Downloading remote packages.
Environment for envs/star.yaml created (location: .snakemake/conda/fe1064ae)
Creating conda environment envs/dropseq_tools.yaml...
Downloading remote packages.
Environment for envs/dropseq_tools.yaml created (location: .snakemake/conda/dd296d1f)
Creating conda environment https:/bitbucket.org/snakemake/snakemake-wrappers/raw/0.21.0/bio/multiqc/environment.yaml...
Downloading remote packages.
Environment for ../../tmp/tmp2g315g_j.yaml created (location: .snakemake/conda/81acb004)
Creating conda environment envs/merge_bam.yaml...
Downloading remote packages.
Environment for envs/merge_bam.yaml created (location: .snakemake/conda/4b9c1953)
Creating conda environment envs/plots.yaml...
Downloading remote packages.
Environment for envs/plots.yaml created (location: .snakemake/conda/840da6c6)
Creating conda environment https:/bitbucket.org/snakemake/snakemake-wrappers/raw/0.27.1/bio/star/align/environment.yaml...
Downloading remote packages.
Environment for ../../tmp/tmp6n23qi3n.yaml created (location: .snakemake/conda/54fabd57)
Creating conda environment envs/bbmap.yaml...
Downloading remote packages.
Environment for envs/bbmap.yaml created (location: .snakemake/conda/8c850d6e)
Creating conda environment envs/picard.yaml...
Downloading remote packages.
Environment for envs/picard.yaml created (location: .snakemake/conda/8331163d)
Using shell: /bin/bash
Provided cores: 6
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   DetectBeadSubstitutionErrors
    1   MergeBamAlignment
    1   STAR_align
    1   SingleCellRnaSeqMetricsCollector
    1   TagReadWithGeneExon
    1   all
    1   bam_hist
    1   bead_errors_metrics
    1   clean_cutadapt
    2   convert_long_to_mtx
    1   create_dict
    1   create_intervals
    1   create_refFlat
    1   create_star_index
    1   curate_annotation
    1   cutadapt_R1
    1   cutadapt_R2
    1   extend_barcode_whitelist
    1   extract_reads_expression
    1   extract_umi_expression
    1   fastqc_barcodes
    1   fastqc_reads
    2   merge_long
    1   multiqc_cutadapt_RNA
    1   multiqc_cutadapt_barcodes
    1   multiqc_fastqc_barcodes
    1   multiqc_fastqc_reads
    1   multiqc_star
    1   plot_adapter_content
    1   plot_knee_plot
    1   plot_rna_metrics
    1   plot_yield
    1   reduce_gtf
    1   repair
    1   repair_barcodes
    1   violine_plots
    38

[Mon Dec 31 04:38:24 2018]
localrule extend_barcode_whitelist:
    input: /home/barcode_whitelist.txt
    output: /home/results/samples/RA0449.0/barcodes.csv, /home/results/samples/RA0449.0/barcode_ref.pkl, /home/results/samples/RA0449.0/barcode_ext_ref.pkl, /home/results/samples/RA0449.0/empty_barcode_mapping.pkl
    jobid: 24
    wildcards: results_dir=/home/results, sample=RA0449.0

[Mon Dec 31 04:38:24 2018]
rule fastqc_reads:
    input: /home/data/RA0449.0_R2.fastq.gz
    output: /home/results/logs/fastqc/RA0449.0_R2_fastqc.html, /home/results/logs/fastqc/RA0449.0_R2_fastqc.zip
    jobid: 18
    wildcards: results_dir=/home/results, sample=RA0449.0

[Mon Dec 31 04:38:24 2018]
rule fastqc_barcodes:
    input: /home/data/RA0449.0_R1.fastq.gz
    output: /home/results/logs/fastqc/RA0449.0_R1_fastqc.html, /home/results/logs/fastqc/RA0449.0_R1_fastqc.zip
    jobid: 19
    wildcards: results_dir=/home/results, sample=RA0449.0

[Mon Dec 31 04:38:24 2018]
localrule create_dict:
    input: /home/ref/MmulKitwit_8_92/genome.fa
    output: /home/ref/MmulKitwit_8_92/genome.dict
    jobid: 35
    wildcards: ref_path=/home/ref, species=MmulKitwit, build=8, release=92

Activating conda environment: /home/dropSeqPipe/.snakemake/conda/8331163d

[Mon Dec 31 04:38:24 2018]
localrule curate_annotation:
    input: /home/dropSeqPipe/templates/gtf_biotypes.yaml, /home/ref/MmulKitwit_8_92/annotation.gtf
    output: /home/ref/MmulKitwit_8_92/curated_annotation.gtf
    jobid: 17
    wildcards: ref_path=/home/ref, species=MmulKitwit, build=8, release=92

[Mon Dec 31 04:38:24 2018]
Finished job 24.
1 of 38 steps (3%) done
Activating conda environment: /home/dropSeqPipe/.snakemake/conda/73b3d757
Activating conda environment: /home/dropSeqPipe/.snakemake/conda/73b3d757
[Mon Dec 31 04:38:26 2018]
Finished job 17.
2 of 38 steps (5%) done
[Mon Dec 31 04:38:54 2018]
Finished job 19.
3 of 38 steps (8%) done

[Mon Dec 31 04:38:54 2018]
localrule multiqc_fastqc_barcodes:
    input: /home/results/logs/fastqc/RA0449.0_R1_fastqc.html
    output: /home/results/reports/fastqc_barcodes.html
    jobid: 3
    wildcards: results_dir=/home/results

Conda environment defines Python version < 3.5. Using Python of the master process to execute script. Note that this cannot be avoided, because the script uses data structures from Snakemake which are Python >=3.5 only.
Activating conda environment: /home/dropSeqPipe/.snakemake/conda/81acb004
[Mon Dec 31 04:38:55 2018]
Finished job 35.
4 of 38 steps (11%) done

[Mon Dec 31 04:38:55 2018]
localrule create_refFlat:
    input: /home/ref/MmulKitwit_8_92/genome.dict, /home/ref/MmulKitwit_8_92/curated_annotation.gtf
    output: /home/ref/MmulKitwit_8_92/curated_annotation.refFlat
    jobid: 32
    wildcards: ref_path=/home/ref, species=MmulKitwit, build=8, release=92

Activating conda environment: /home/dropSeqPipe/.snakemake/conda/dd296d1f

[Mon Dec 31 04:38:55 2018]
localrule reduce_gtf:
    input: /home/ref/MmulKitwit_8_92/genome.dict, /home/ref/MmulKitwit_8_92/curated_annotation.gtf
    output: /home/ref/MmulKitwit_8_92/curated_reduced_annotation.gtf
    jobid: 36
    wildcards: ref_path=/home/ref, species=MmulKitwit, build=8, release=92

Activating conda environment: /home/dropSeqPipe/.snakemake/conda/dd296d1f
[Mon Dec 31 04:38:58 2018]
Finished job 3.
5 of 38 steps (13%) done
[Mon Dec 31 04:39:27 2018]
Finished job 18.
6 of 38 steps (16%) done

[Mon Dec 31 04:39:27 2018]
localrule multiqc_fastqc_reads:
    input: /home/results/logs/fastqc/RA0449.0_R2_fastqc.html
    output: /home/results/reports/fastqc_reads.html
    jobid: 2
    wildcards: results_dir=/home/results

Conda environment defines Python version < 3.5. Using Python of the master process to execute script. Note that this cannot be avoided, because the script uses data structures from Snakemake which are Python >=3.5 only.
Activating conda environment: /home/dropSeqPipe/.snakemake/conda/81acb004
[Mon Dec 31 04:39:30 2018]
Finished job 2.
7 of 38 steps (18%) done
[Mon Dec 31 04:39:38 2018]
Finished job 36.
8 of 38 steps (21%) done

[Mon Dec 31 04:39:38 2018]
localrule create_intervals:
    input: /home/ref/MmulKitwit_8_92/curated_reduced_annotation.gtf, /home/ref/MmulKitwit_8_92/genome.dict
    output: /home/ref/MmulKitwit_8_92/annotation.rRNA.intervals
    jobid: 33
    wildcards: ref_path=/home/ref, species=MmulKitwit, build=8, release=92

Activating conda environment: /home/dropSeqPipe/.snakemake/conda/dd296d1f
[Mon Dec 31 04:39:40 2018]
Finished job 32.
9 of 38 steps (24%) done
[Mon Dec 31 04:40:20 2018]
Finished job 33.
10 of 38 steps (26%) done

[Mon Dec 31 04:40:20 2018]
rule create_star_index:
    input: /home/ref/MmulKitwit_8_92/genome.fa, /home/ref/MmulKitwit_8_92/curated_annotation.gtf
    output: /home/ref/MmulKitwit_8_92/STAR_INDEX/SA_88/SA
    jobid: 1
    wildcards: ref_path=/home/ref, species=MmulKitwit, build=8, release=92, read_length=88
    threads: 6

Activating conda environment: /home/dropSeqPipe/.snakemake/conda/fe1064ae
Removing temporary output file /home/ref/MmulKitwit_8_92/curated_annotation.gtf.
[Mon Dec 31 05:47:25 2018]
Finished job 1.
11 of 38 steps (29%) done

[Mon Dec 31 05:47:25 2018]
rule cutadapt_R2:
    input: /home/data/RA0449.0_R2.fastq.gz, /home/NexteraPE-SeqWell-PE-fastqc.fa
    output: /home/results/samples/RA0449.0/trimmmed_R2.fastq.gz
    log: /home/results/logs/cutadapt/RA0449.0_R2.qc.txt
    jobid: 22
    wildcards: results_dir=/home/results, sample=RA0449.0
    threads: 6

Activating conda environment: /home/dropSeqPipe/.snakemake/conda/7dc41205
[Mon Dec 31 05:51:12 2018]
Finished job 22.
12 of 38 steps (32%) done

[Mon Dec 31 05:51:12 2018]
rule cutadapt_R1:
    input: /home/data/RA0449.0_R1.fastq.gz, /home/NexteraPE-SeqWell-PE-fastqc.fa
    output: /home/results/samples/RA0449.0/trimmmed_R1.fastq.gz
    log: /home/results/logs/cutadapt/RA0449.0_R1.qc.txt
    jobid: 21
    wildcards: results_dir=/home/results, sample=RA0449.0
    threads: 6

Activating conda environment: /home/dropSeqPipe/.snakemake/conda/7dc41205
[Mon Dec 31 05:53:15 2018]
Finished job 21.
13 of 38 steps (34%) done

[Mon Dec 31 05:53:15 2018]
localrule clean_cutadapt:
    input: /home/results/logs/cutadapt/RA0449.0_R1.qc.txt, /home/results/logs/cutadapt/RA0449.0_R2.qc.txt
    output: /home/results/logs/cutadapt/RA0449.0.clean_qc.csv
    jobid: 20
    wildcards: results_dir=/home/results, sample=RA0449.0

[Mon Dec 31 05:53:15 2018]
rule repair:
    input: /home/results/samples/RA0449.0/trimmmed_R1.fastq.gz, /home/results/samples/RA0449.0/trimmmed_R2.fastq.gz
    output: /home/results/samples/RA0449.0/trimmmed_repaired_R1.fastq.gz, /home/results/samples/RA0449.0/trimmmed_repaired_R2.fastq.gz
    log: /home/results/logs/bbmap/RA0449.0_repair.txt
    jobid: 26
    wildcards: results_dir=/home/results, sample=RA0449.0
    threads: 4

[Mon Dec 31 05:53:15 2018]
localrule multiqc_cutadapt_RNA:
    input: /home/results/logs/cutadapt/RA0449.0_R2.qc.txt
    output: /home/results/reports/RNA_filtering.html
    jobid: 6
    wildcards: results_dir=/home/results
Activating conda environment: /home/dropSeqPipe/.snakemake/conda/8c850d6e

[Mon Dec 31 05:53:16 2018]
Finished job 20.
14 of 38 steps (37%) done

[Mon Dec 31 05:53:16 2018]
localrule multiqc_cutadapt_barcodes:
    input: /home/results/logs/cutadapt/RA0449.0_R1.qc.txt
    output: /home/results/reports/barcode_filtering.html
    jobid: 5
    wildcards: results_dir=/home/results

Conda environment defines Python version < 3.5. Using Python of the master process to execute script. Note that this cannot be avoided, because the script uses data structures from Snakemake which are Python >=3.5 only.
Activating conda environment: /home/dropSeqPipe/.snakemake/conda/81acb004
Conda environment defines Python version < 3.5. Using Python of the master process to execute script. Note that this cannot be avoided, because the script uses data structures from Snakemake which are Python >=3.5 only.
Activating conda environment: /home/dropSeqPipe/.snakemake/conda/81acb004
[Mon Dec 31 05:53:19 2018]
Finished job 6.
15 of 38 steps (39%) done

[Mon Dec 31 05:53:19 2018]
localrule plot_adapter_content:
    input: /home/results/logs/cutadapt/RA0449.0.clean_qc.csv
    output: /home/results/plots/adapter_content.pdf
    jobid: 4
    wildcards: results_dir=/home/results

[Mon Dec 31 05:53:19 2018]
Finished job 5.
Activating conda environment: /home/dropSeqPipe/.snakemake/conda/840da6c6
16 of 38 steps (42%) done
[Mon Dec 31 05:53:26 2018]
Finished job 4.
17 of 38 steps (45%) done
Removing temporary output file /home/results/samples/RA0449.0/trimmmed_R1.fastq.gz.
Removing temporary output file /home/results/samples/RA0449.0/trimmmed_R2.fastq.gz.
[Mon Dec 31 05:53:47 2018]
Finished job 26.
18 of 38 steps (47%) done

[Mon Dec 31 05:53:47 2018]
rule STAR_align:
    input: /home/results/samples/RA0449.0/trimmmed_repaired_R2.fastq.gz, /home/ref/MmulKitwit_8_92/STAR_INDEX/SA_88/SA
    output: /home/results/samples/RA0449.0/Aligned.out.bam
    log: /home/results/samples/RA0449.0/Log.final.out
    jobid: 25
    wildcards: results_dir=/home/results, sample=RA0449.0
    threads: 6

Activating conda environment: /home/dropSeqPipe/.snakemake/conda/54fabd57
[Mon Dec 31 05:59:07 2018]
Finished job 25.
19 of 38 steps (50%) done

[Mon Dec 31 05:59:07 2018]
localrule multiqc_star:
    input: /home/results/samples/RA0449.0/Log.final.out
    output: /home/results/reports/star.html
    jobid: 8
    wildcards: results_dir=/home/results

[Mon Dec 31 05:59:07 2018]
rule MergeBamAlignment:
    input: /home/results/samples/RA0449.0/Aligned.out.bam, /home/results/samples/RA0449.0/trimmmed_repaired_R1.fastq.gz
    output: /home/results/samples/RA0449.0/Aligned.merged.bam
    jobid: 39
    wildcards: results_dir=/home/results, sample=RA0449.0

[Mon Dec 31 05:59:07 2018]
localrule plot_yield:
    input: /home/results/logs/cutadapt/RA0449.0_R1.qc.txt, /home/results/logs/cutadapt/RA0449.0_R2.qc.txt, /home/results/logs/bbmap/RA0449.0_repair.txt, /home/results/samples/RA0449.0/Log.final.out
    output: /home/results/plots/yield.pdf
    jobid: 9
    wildcards: results_dir=/home/results

Activating conda environment: /home/dropSeqPipe/.snakemake/conda/4b9c1953
Activating conda environment: /home/dropSeqPipe/.snakemake/conda/840da6c6
Conda environment defines Python version < 3.5. Using Python of the master process to execute script. Note that this cannot be avoided, because the script uses data structures from Snakemake which are Python >=3.5 only.
Activating conda environment: /home/dropSeqPipe/.snakemake/conda/81acb004
[Mon Dec 31 05:59:12 2018]
Finished job 8.
20 of 38 steps (53%) done
[Mon Dec 31 05:59:15 2018]
Finished job 9.
21 of 38 steps (55%) done
Removing temporary output file /home/results/samples/RA0449.0/Aligned.out.bam.
[Mon Dec 31 06:01:43 2018]
Finished job 39.
22 of 38 steps (58%) done

[Mon Dec 31 06:01:43 2018]
rule repair_barcodes:
    input: /home/results/samples/RA0449.0/Aligned.merged.bam, /home/results/samples/RA0449.0/barcode_ref.pkl, /home/results/samples/RA0449.0/barcode_ext_ref.pkl, /home/results/samples/RA0449.0/empty_barcode_mapping.pkl
    output: /home/results/samples/RA0449.0/Aligned.repaired.bam, /home/results/samples/RA0449.0/barcode_mapping_counts.pkl
    jobid: 38
    wildcards: results_dir=/home/results, sample=RA0449.0

Activating conda environment: /home/dropSeqPipe/.snakemake/conda/4b9c1953
[Mon Dec 31 06:01:44 2018]
Error in rule repair_barcodes:
    jobid: 38
    output: /home/results/samples/RA0449.0/Aligned.repaired.bam, /home/results/samples/RA0449.0/barcode_mapping_counts.pkl
    conda-env: /home/dropSeqPipe/.snakemake/conda/4b9c1953

RuleException:
CalledProcessError in line 72 of /home/dropSeqPipe/rules/cell_barcodes.smk:
Command 'source activate /home/dropSeqPipe/.snakemake/conda/4b9c1953; set -euo pipefail;  python /home/dropSeqPipe/.snakemake/scripts/tmpfqi965ma.repair_barcodes.py ' returned non-zero exit status 1.
  File "/home/dropSeqPipe/rules/cell_barcodes.smk", line 72, in __rule_repair_barcodes
  File "/opt/miniconda3/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Removing output files of failed job repair_barcodes since they might be corrupted:
/home/results/samples/RA0449.0/Aligned.repaired.bam
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/dropSeqPipe/.snakemake/log/2018-12-31T042813.325812.snakemake.log

I'll be out of town until 1/8 but I am happy to try to debug this further when I get back.

Hoohm commented 5 years ago

Ok have fun.

When you come back can you find the name of the sequencer that produced the data? That might help implement a different standard.

I have modified the same branch again, might have fixed the issue. I think it comes from a split I did on read name to get the lane of the sequencer.

dylkot commented 5 years ago

That was data from a miseq. We ultimately traced the issue to a read name issue deriving from the fact that the data was demux'd with Picard rather than bcl2fastq2. Several steps didn't like having \1 and \2 in the read names indicating which mate they belonged to. And also, the lane information occurred in a different place in the read name. We want to use bcl2fastq2 going forward anyway so we have been avoiding this problem. But worth just letting you know the source

Hoohm commented 4 years ago

Thanks @dylkot. I'll close this now since it has been fixed.

Hoohm / dropSeqPipe

Error in rule MergeBamAlignment - Read from mapped file is missing in reference fastq file! #69