AllenNeuralDynamics / aind-ephys-pipeline

Code Ocean pipeline for ephys processing with Kilosort2.5
MIT License
5 stars 2 forks source link

Unknown error #3

Closed bjhardcastle closed 6 months ago

bjhardcastle commented 8 months ago

Run 8794328:

N E X T F L O W  ~  version 22.10.0
Launching `main.nf` [comp-586db6f6-4c90-43bc-8207-f2b3394] DSL1 - revision: c8d71b5fa9
[35/5e959d] Submitted process > capsule_nwb_packaging_subject_capsule_10 (capsule-1748641)
[fe/eadb62] Submitted process > capsule_aind_ephys_job_dispatch_4 (capsule-5089190)
[capsule-5089190] cloning git repo...
[capsule-5089190] running capsule...
Running job dispatcher with the following parameters:
    CONCATENATE RECORDINGS: False
Session: ecephys_666986_2023-08-15_08-13-00
    Session path from data: ecephys_session - Open Ephys folder: ../data/ecephys_session/ecephys_clipped
    Num. Blocks 1 - Num. streams: 13
    Recording to be processed in parallel:
        experiment1_Record Node 102#Neuropix-PXI-100.ProbeA-AP_recording1 - Duration: 7022.81 s
        experiment1_Record Node 102#Neuropix-PXI-100.ProbeB-AP_recording1 - Duration: 7022.81 s
        experiment1_Record Node 102#Neuropix-PXI-100.ProbeC-AP_recording1 - Duration: 7022.84 s
        experiment1_Record Node 103#Neuropix-PXI-100.ProbeD-AP_recording1 - Duration: 7022.75 s
        experiment1_Record Node 103#Neuropix-PXI-100.ProbeE-AP_recording1 - Duration: 7022.78 s
        experiment1_Record Node 103#Neuropix-PXI-100.ProbeF-AP_recording1 - Duration: 7022.76 s
Generated 6 job config files
[capsule-5089190] completed!
[33/5d4a5d] Submitted process > capsule_aind_ephys_preprocessing_1 (capsule-0874799)
[49/919f8b] Submitted process > capsule_aind_ephys_preprocessing_1 (capsule-0874799)
[12/7ffe09] Submitted process > capsule_aind_ephys_preprocessing_1 (capsule-0874799)
[00/d36df9] Submitted process > capsule_aind_ephys_preprocessing_1 (capsule-0874799)
[bc/32cda0] Submitted process > capsule_aind_ephys_preprocessing_1 (capsule-0874799)
[5f/774cd9] Submitted process > capsule_aind_ephys_preprocessing_1 (capsule-0874799)
Error executing process > 'capsule_aind_ephys_preprocessing_1 (capsule-0874799)'

Caused by:
  Task failed to start - CannotPullContainerError: context canceled

Command executed:

  #!/usr/bin/env bash
  set -e

  export CO_CAPSULE_ID=05eaf483-9ca3-4a9e-8da8-7d23717f6faf
  export CO_CPUS=16
  export CO_MEMORY=68719476736

  mkdir -p capsule
  mkdir -p capsule/data && ln -s $PWD/capsule/data /data
  mkdir -p capsule/results && ln -s $PWD/capsule/results /results
  mkdir -p capsule/scratch && ln -s $PWD/capsule/scratch /scratch

  echo "[capsule-0874799] cloning git repo..."
  git clone "https://$GIT_ACCESS_TOKEN@codeocean.allenneuraldynamics.org/capsule-0874799.git" capsule-repo
  git -C capsule-repo checkout a39afcd6f533ef584d721423815206192bddb12c --quiet
  mv capsule-repo/code capsule/code
  rm -rf capsule-repo

  echo "[capsule-0874799] running capsule..."
  cd capsule/code
  chmod +x run
  ./run

  echo "[capsule-0874799] completed!"

Command exit status:
  -

Command output:
  (empty)

Work dir:
  s3://codeocean-s3batchbucket-16itpvq060udk/1f8f159a-7670-47a9-baf1-078905fc9c2e/12/7ffe09018156bc4387a23049bf9daf

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

WARN: Killing running tasks (6)

Run 8938517:

N E X T F L O W  ~  version 22.10.0
Launching `main.nf` [comp-9d983327-eb1b-43de-9f45-5f36b37] DSL1 - revision: c8d71b5fa9
[f9/40a696] Submitted process > capsule_nwb_packaging_subject_capsule_10 (capsule-1748641)
[df/533671] Submitted process > capsule_aind_ephys_job_dispatch_4 (capsule-5089190)
[capsule-1748641] cloning git repo...
[capsule-1748641] running capsule...
Backend: zarr
Asset name: ecephys_646318_2023-01-18_10-44-42
Saved ../results/ecephys_646318_2023-01-18_10-44-42.nwb
[capsule-1748641] completed!
[capsule-5089190] cloning git repo...
[capsule-5089190] running capsule...
Running job dispatcher with the following parameters:
    CONCATENATE RECORDINGS: False
Session: ecephys_646318_2023-01-18_10-44-42
    Session path from data: ecephys_session - Open Ephys folder: ../data/ecephys_session/ecephys_clipped
    Num. Blocks 1 - Num. streams: 9
    Recording to be processed in parallel:
        experiment1_Record Node 110#Neuropix-PXI-107.ProbeA-AP_recording1 - Duration: 4894.87 s
        experiment1_Record Node 110#Neuropix-PXI-107.ProbeB-AP_recording1 - Duration: 4894.84 s
        experiment1_Record Node 110#Neuropix-PXI-107.ProbeC-AP_recording1 - Duration: 4894.9 s
        experiment1_Record Node 110#Neuropix-PXI-107.ProbeF-AP_recording1 - Duration: 4894.85 s
Generated 4 job config files
[capsule-5089190] completed!
[d7/1fc383] Submitted process > capsule_aind_ephys_preprocessing_1 (capsule-0874799)
[42/ee33b0] Submitted process > capsule_aind_ephys_preprocessing_1 (capsule-0874799)
[ad/ca022f] Submitted process > capsule_aind_ephys_preprocessing_1 (capsule-0874799)
[4e/5775eb] Submitted process > capsule_aind_ephys_preprocessing_1 (capsule-0874799)
Error executing process > 'capsule_aind_ephys_preprocessing_1 (capsule-0874799)'

Caused by:
  Task failed to start - CannotPullContainerError: context canceled

Command executed:

  #!/usr/bin/env bash
  set -e

  export CO_CAPSULE_ID=05eaf483-9ca3-4a9e-8da8-7d23717f6faf
  export CO_CPUS=16
  export CO_MEMORY=68719476736

  mkdir -p capsule
  mkdir -p capsule/data && ln -s $PWD/capsule/data /data
  mkdir -p capsule/results && ln -s $PWD/capsule/results /results
  mkdir -p capsule/scratch && ln -s $PWD/capsule/scratch /scratch

  echo "[capsule-0874799] cloning git repo..."
  git clone "https://$GIT_ACCESS_TOKEN@codeocean.allenneuraldynamics.org/capsule-0874799.git" capsule-repo
  git -C capsule-repo checkout a39afcd6f533ef584d721423815206192bddb12c --quiet
  mv capsule-repo/code capsule/code
  rm -rf capsule-repo

  echo "[capsule-0874799] running capsule..."
  cd capsule/code
  chmod +x run
  ./run

  echo "[capsule-0874799] completed!"

Command exit status:
  -

Command output:
  (empty)

Work dir:
  s3://codeocean-s3batchbucket-16itpvq060udk/1f8f159a-7670-47a9-baf1-078905fc9c2e/ad/ca022f53efa9397f3e68e8234d7647

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

WARN: Killing running tasks (3)

Will try re-running...

alejoe91 commented 8 months ago

Yeah it doesn't seem related to the pipeline, but to the AWS batch

alejoe91 commented 7 months ago

We can use a Retry strategy as explained here: https://www.nextflow.io/docs/latest/process.html#maxerrors

This is now supported in Code Ocean

alejoe91 commented 6 months ago

Closing this because it should be fixed