ENCODE-DCC / wgbs-pipeline

ENCODE whole-genome bisulfite sequencing (WGBS) pipeline
MIT License
28 stars 13 forks source link

TypeError in wgbs.map Step #66

Open ebprideaux opened 2 years ago

ebprideaux commented 2 years ago

Describe the bug At the wgbs.map step, I get a TypeError:

==== NAME=wgbs.map, STATUS=Failed, PARENT= SHARD_IDX=1, RC=1, JOB_ID=9608 START=2021-10-15T03:04:21.289Z, END=2021-10-15T03:09:23.663Z STDOUT=/resource3/data/WGBS/Processed_Caper/wgbs/9f04e4dd-2a84-4af6-8731-d3c49f6e2782/call-map/shard-1/attempt-2/execution/stdout STDERR=/resource3/data/WGBS/Processed_Caper/wgbs/9f04e4dd-2a84-4af6-8731-d3c49f6e2782/call-map/shard-1/attempt-2/execution/stderr STDERR_CONTENTS= : : Command map started at 2021-10-14 20:08:16.349606 : : ------------ Mapping Parameters ------------ : Sample barcode : sample_1 : Data set : 1 : No. threads : 8 : Index : indexes/hg38.BS.gem : Paired : False : Read non stranded: False : Type : SINGLE : Input Files : ./fastq/1/Control_S1_L004_R2_001.fastq.gz : Output dir : ./mapping/sample_1 : : Bisulfite Mapping... TypeError: sequence item 14: expected str instance, NoneType found ln: failed to access '/resource3/data/WGBS/Processed_Caper/wgbs/9f04e4dd-2a84-4af6-8731-d3c49f6e2782/call-map/shard-1/attempt-2/execution/mapping//*.bam': No such file or directory ln: failed to access '/resource3/data/WGBS/Processed_Caper/wgbs/9f04e4dd-2a84-4af6-8731-d3c49f6e2782/call-map/shard-1/attempt-2/execution/mapping/*/.csi': No such file or directory ln: failed to access '/resource3/data/WGBS/Processed_Caper/wgbs/9f04e4dd-2a84-4af6-8731-d3c49f6e2782/call-map/shard-1/attempt-2/execution/mapping//*.bam.md5': No such file or directory ln: failed to access '/resource3/data/WGBS/Processed_Caper/wgbs/9f04e4dd-2a84-4af6-8731-d3c49f6e2782/call-map/shard-1/attempt-2/execution/mapping/*/.json': No such file or directory

How can I resolve this error?

OS/Platform

Caper configuration file default.conf.txt

Error log Caper automatically runs a troubleshooter for failed workflows. If it doesn't then get a WORKFLOW_ID of your failed workflow with caper list or directly use a metadata.json file on Caper's output directory.

$ caper debug [WORKFLOW_ID_OR_METADATA_JSON_FILE]

cromwell.out.txt

Input JSON File json_input.txt

paul-sud commented 2 years ago

This issue in gemBS is almost exactly the same as yours: https://github.com/heathsc/gemBS/issues/37 , although I'm not sure if it is relevant to the version of gemBS used in the pipeline. You may be able to work around by passing in "wgbs.underconversion_sequence_name": "chrL" in your input. Since it looks like you used a lambda control you probably want to have this value there anyway so you can get the QC value for the bisulfite conversion rate.

FYI Conda isn't supported by this pipeline. I'd recommend using Docker, or if your HPC doesn't allow it then Singularity would be an option. In theory it should "just work" but in practice there can be quirks due to differences between the runtimes. I haven't tested this pipeline myself with Singularity so I can't say for sure.

By the way, looking at your input files, it looks like you might have paired-end data? You have it specified as

  "wgbs.fastqs": [
    [
      [
        "/resource3/data/WGBS/RawData/222_S2_L004_R1_001.fastq.gz"
      ],
      [
        "/resource3/data/WGBS/RawData/222_S2_L004_R2_001.fastq.gz"
      ]
    ],
    [
      [
        "/resource3/data/WGBS/RawData/Control_S1_L004_R1_001.fastq.gz"
      ],
      [
        "/resource3/data/WGBS/RawData/Control_S1_L004_R2_001.fastq.gz"
      ]
    ]
  ],

but if they are in fact paired, the two files should be placed in the same array like this:

  "wgbs.fastqs": [
    [
      [
        "/resource3/data/WGBS/RawData/222_S2_L004_R1_001.fastq.gz",
        "/resource3/data/WGBS/RawData/222_S2_L004_R2_001.fastq.gz"
      ]
    ],
    [
      [
        "/resource3/data/WGBS/RawData/Control_S1_L004_R1_001.fastq.gz",
        "/resource3/data/WGBS/RawData/Control_S1_L004_R2_001.fastq.gz"
      ]
    ]
  ],
ebprideaux commented 2 years ago

I went ahead and changed the json input to reflect paired-end data. Thank you!

I believe my original json input already had:

"wgbs.underconversion_sequence_name": "chrL"

Is this what you were referring to?

Our HPC doesn't allow Docker. I don't have experience with Singularity, but may need to talk with our sysadmin about adding it (I don't have root privileges).

I am re-running with updated json input to see if error is replicated. Will follow up with results. RHwgbsinput copy.json.update.txt

paul-sud commented 2 years ago

Yeah sorry I missed that in the input, that's what I was referring to. In that case it looks OK then. If it still fails I would double check your gemBS version. You can see how it is installed in the pipeline here:

https://github.com/ENCODE-DCC/wgbs-pipeline/blob/f41d0dc64020e68c804898527620a19ad7f470ac/Dockerfile#L37

ebprideaux commented 2 years ago

Confirming it failed on the same task:

==== NAME=wgbs.map, STATUS=Failed, PARENT= SHARD_IDX=1, RC=1, JOB_ID=17853 START=2021-10-15T19:22:17.505Z, END=2021-10-15T19:27:35.017Z STDOUT=/resource3/data/WGBS/Processed/wgbs/2ef10817-0cfc-4575-b976-16e85d0a46a3/call-map/shard-1/attempt-2/execution/stdout STDERR=/resource3/data/WGBS/Processed/wgbs/2ef10817-0cfc-4575-b976-16e85d0a46a3/call-map/shard-1/attempt-2/execution/stderr STDERR_CONTENTS= : : Command map started at 2021-10-15 12:26:08.265708 : : ------------ Mapping Parameters ------------ : Sample barcode : sample_1 : Data set : 1 : No. threads : 8 : Index : indexes/hg38.BS.gem : Paired : True : Read non stranded: False : Type : PAIRED : Input Files : ./fastq/1/Control_S1_L004_R1_001.fastq.gz,./fastq/1/Control_S1_L004_R2_001.fastq.gz : Output dir : ./mapping/sample_1 : : Bisulfite Mapping... TypeError: sequence item 17: expected str instance, NoneType found ln: failed to access '/resource3/data/WGBS/Processed/wgbs/2ef10817-0cfc-4575-b976-16e85d0a46a3/call-map/shard-1/attempt-2/execution/mapping//*.bam': No such file or directory ln: failed to access '/resource3/data/WGBS/Processed/wgbs/2ef10817-0cfc-4575-b976-16e85d0a46a3/call-map/shard-1/attempt-2/execution/mapping/*/.csi': No such file or directory ln: failed to access '/resource3/data/WGBS/Processed/wgbs/2ef10817-0cfc-4575-b976-16e85d0a46a3/call-map/shard-1/attempt-2/execution/mapping//*.bam.md5': No such file or directory ln: failed to access '/resource3/data/WGBS/Processed/wgbs/2ef10817-0cfc-4575-b976-16e85d0a46a3/call-map/shard-1/attempt-2/execution/mapping/*/.json': No such file or directory

Looks like the gemBS conda installed is gembs-3.2.0 (released june 2018). I think this is likely the problem. From your link earlier, they pushed a new version that incorporated changes fixing this TypeError. Will try updating and get back to you.