MountainLab is data processing, sharing and visualization software for scientists. It is built around MountainSort, spike sorting software, but is designed to be more generally applicable.
Other
43
stars
30
forks
source link
ms4alg processors randomly fail to run on cluster with node-js error #82
I am spike sorting data sets on our local cluster (which uses SLURM) with mountainlab-js, making use of the different processors ms4alg.sort, ms4alg.create_label_map, ms4alg.apply_label_map. I run them as a part of a snakemake pipeline. Snakemake is a workflow management system which allows me to run large parameter scans easily. Each rule in a snakemake workflow is submitted as an individual job to the queuing system on the cluster, and hence works independently.
Of late, I have been seeing these errors randomly when running the processors from the ms4alg package.
(node:7572) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'original_checksum' of undefined
at /mnt/beegfs/home/s.sridhar/scripts/pipelines/ml_pipeline/.snakemake/conda/1f2f2c8f/lib/node_modules/mountainlab/mlproc/prv_utils.js:202:24
at /mnt/beegfs/home/s.sridhar/scripts/pipelines/ml_pipeline/.snakemake/conda/1f2f2c8f/lib/node_modules/mountainlab/mlproc/prv_utils.js:192:7
(node:7572) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:7572) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
(node:8167) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'original_checksum' of undefined
at /mnt/beegfs/home/s.sridhar/scripts/pipelines/ml_pipeline/.snakemake/conda/1f2f2c8f/lib/node_modules/mountainlab/mlproc/prv_utils.js:202:24
at /mnt/beegfs/home/s.sridhar/scripts/pipelines/ml_pipeline/.snakemake/conda/1f2f2c8f/lib/node_modules/mountainlab/mlproc/prv_utils.js:192:7
(node:8167) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:8167) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
Since this is being run on the cluster, the corresponding output looks like this:
[ Getting processor spec... ]
[ Checking inputs and substituting prvs ... ]
[ Computing process signature ... ]
Process signature: cf7bd2ec46045c17c672c8bc4ddbf9075ea1d480
[ Checking outputs... ]
{"label_map_out":"/tmp/ml_create_label_map/i140703-001_50_6_35_0.5_0.92_0.2_0.5/ml_label_map_out.mda"}
Processing ouput - /tmp/ml_create_label_map/i140703-001_50_6_35_0.5_0.92_0.2_0.5/ml_label_map_out.mda
false
{"label_map_out":"/tmp/ml_create_label_map/i140703-001_50_6_35_0.5_0.92_0.2_0.5/ml_label_map_out.mda"}
[ Checking process cache ... ]
[ Creating temporary directory ... ]
[ Creating links to input files... ]
[ Preparing temporary outputs... ]
Processing ouput - /tmp/ml_create_label_map/i140703-001_50_6_35_0.5_0.92_0.2_0.5/ml_label_map_out.mda
false
[ Initializing process ... ]
[ Running ... ] /mnt/beegfs/home/s.sridhar/scripts/pipelines/ml_pipeline/.snakemake/conda/1f2f2c8f/bin/python3 /mnt/beegfs/home/s.sridhar/scripts/pipelines/ml_pipeline/.snakemake/conda/1f2f2c8f/etc/mountainlab/packages/ml_ms4alg/curation_spec.py.mp ms4alg.create_label_map --_tempdir=/tmp/mountainlab-tmp/tempdir_cf7bd2ec46_UfI18k --metrics=/tmp/mountainlab-tmp/tempdir_cf7bd2ec46_UfI18k/input_metrics_RMvnkPQS.json --label_map_out=/tmp/mountainlab-tmp/tempdir_cf7bd2ec46_UfI18k/output_label_map_out.mda --firing_rate_thresh=0.5 --isolation_thresh=0.92 --noise_overlap_thresh=0.2 --peak_snr_thresh=0.5
Elapsed time for processor ms4alg.create_label_map: 3.447 sec
Finalizing output label_map_out
[ Saving to process cache ... ]
[ Getting processor spec... ]
[ Checking inputs and substituting prvs ... ]
[ Computing process signature ... ]
The script that is originally run looks like this:
The slightly strange formatting is due to the wildcards system that snakemake follows. It fills in the wildcard entries automatically for different values that I request, and runs this script for each such such parameter set. This is an example of the create_label_map+apply_label_map step. A very similar script is also deployed for the sort step, and also yields the same error.
This error appears randomly, that is, if I run the script for the same configuration (same parameter set, for instance) again, it doesn't necessarily reappear. The conda environment that snakemake creates and uses has the following packages installed:
Hello,
I am spike sorting data sets on our local cluster (which uses SLURM) with mountainlab-js, making use of the different processors
ms4alg.sort
,ms4alg.create_label_map
,ms4alg.apply_label_map
. I run them as a part of a snakemake pipeline. Snakemake is a workflow management system which allows me to run large parameter scans easily. Each rule in a snakemake workflow is submitted as an individual job to the queuing system on the cluster, and hence works independently.Of late, I have been seeing these errors randomly when running the processors from the
ms4alg
package.Since this is being run on the cluster, the corresponding output looks like this:
The script that is originally run looks like this:
The slightly strange formatting is due to the wildcards system that snakemake follows. It fills in the wildcard entries automatically for different values that I request, and runs this script for each such such parameter set. This is an example of the create_label_map+apply_label_map step. A very similar script is also deployed for the sort step, and also yields the same error.
This error appears randomly, that is, if I run the script for the same configuration (same parameter set, for instance) again, it doesn't necessarily reappear. The conda environment that snakemake creates and uses has the following packages installed:
Any clue what might be happening? Anything else you need for debugging this?
Thanks!