flatironinstitute / mountainlab-js

MountainLab is data processing, sharing and visualization software for scientists. It is built around MountainSort, spike sorting software, but is designed to be more generally applicable.
Other
43 stars 30 forks source link

ms4alg processors randomly fail to run on cluster with node-js error #82

Open shashwatsridhar opened 5 years ago

shashwatsridhar commented 5 years ago

Hello,

I am spike sorting data sets on our local cluster (which uses SLURM) with mountainlab-js, making use of the different processors ms4alg.sort, ms4alg.create_label_map, ms4alg.apply_label_map. I run them as a part of a snakemake pipeline. Snakemake is a workflow management system which allows me to run large parameter scans easily. Each rule in a snakemake workflow is submitted as an individual job to the queuing system on the cluster, and hence works independently.

Of late, I have been seeing these errors randomly when running the processors from the ms4alg package.

(node:7572) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'original_checksum' of undefined
    at /mnt/beegfs/home/s.sridhar/scripts/pipelines/ml_pipeline/.snakemake/conda/1f2f2c8f/lib/node_modules/mountainlab/mlproc/prv_utils.js:202:24
    at /mnt/beegfs/home/s.sridhar/scripts/pipelines/ml_pipeline/.snakemake/conda/1f2f2c8f/lib/node_modules/mountainlab/mlproc/prv_utils.js:192:7
(node:7572) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:7572) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
(node:8167) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'original_checksum' of undefined
    at /mnt/beegfs/home/s.sridhar/scripts/pipelines/ml_pipeline/.snakemake/conda/1f2f2c8f/lib/node_modules/mountainlab/mlproc/prv_utils.js:202:24
    at /mnt/beegfs/home/s.sridhar/scripts/pipelines/ml_pipeline/.snakemake/conda/1f2f2c8f/lib/node_modules/mountainlab/mlproc/prv_utils.js:192:7
(node:8167) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:8167) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Since this is being run on the cluster, the corresponding output looks like this:

[ Getting processor spec... ]
[ Checking inputs and substituting prvs ... ]
[ Computing process signature ... ]
Process signature: cf7bd2ec46045c17c672c8bc4ddbf9075ea1d480
[ Checking outputs... ]
{"label_map_out":"/tmp/ml_create_label_map/i140703-001_50_6_35_0.5_0.92_0.2_0.5/ml_label_map_out.mda"}
Processing ouput - /tmp/ml_create_label_map/i140703-001_50_6_35_0.5_0.92_0.2_0.5/ml_label_map_out.mda
false
{"label_map_out":"/tmp/ml_create_label_map/i140703-001_50_6_35_0.5_0.92_0.2_0.5/ml_label_map_out.mda"}
[ Checking process cache ... ]
[ Creating temporary directory ... ]
[ Creating links to input files... ]
[ Preparing temporary outputs... ]
Processing ouput - /tmp/ml_create_label_map/i140703-001_50_6_35_0.5_0.92_0.2_0.5/ml_label_map_out.mda
false
[ Initializing process ... ]
[ Running ... ] /mnt/beegfs/home/s.sridhar/scripts/pipelines/ml_pipeline/.snakemake/conda/1f2f2c8f/bin/python3 /mnt/beegfs/home/s.sridhar/scripts/pipelines/ml_pipeline/.snakemake/conda/1f2f2c8f/etc/mountainlab/packages/ml_ms4alg/curation_spec.py.mp ms4alg.create_label_map --_tempdir=/tmp/mountainlab-tmp/tempdir_cf7bd2ec46_UfI18k --metrics=/tmp/mountainlab-tmp/tempdir_cf7bd2ec46_UfI18k/input_metrics_RMvnkPQS.json --label_map_out=/tmp/mountainlab-tmp/tempdir_cf7bd2ec46_UfI18k/output_label_map_out.mda --firing_rate_thresh=0.5 --isolation_thresh=0.92 --noise_overlap_thresh=0.2 --peak_snr_thresh=0.5
Elapsed time for processor ms4alg.create_label_map: 3.447 sec
Finalizing output label_map_out
[ Saving to process cache ... ]
[ Getting processor spec... ]
[ Checking inputs and substituting prvs ... ]
[ Computing process signature ... ]

The script that is originally run looks like this:

TEMPDIR=
        if [[ -z ${{TMPDIR+x}} ]]
        then
            TEMPDIR=/tmp
        else
            TEMPDIR=$TMPDIR
        fi

        TEMPDIR="$TEMPDIR/ml_create_label_map/{wildcards.dataset}_{wildcards.clip_size}_{wildcards.thr}_{wildcards.intvl}_{wildcards.fr}_{wildcards.iso}_{wildcards.noi}_{wildcards.snr}"
        mkdir -p $TEMPDIR/

        exitfunction() {{
            trap - TERM
            rm -r ${{TEMPDIR}}/
        }}

        trap "exitfunction" TERM

        cp {input.metrics} $TEMPDIR/ml_label_map_metrics.json
        cp {input.firings} $TEMPDIR/ml_label_map_firings.mda

        ml-run-process ms4alg.create_label_map \
            --inputs metrics:$TEMPDIR/ml_label_map_metrics.json \
            --outputs label_map_out:$TEMPDIR/ml_label_map_out.mda \
            --parameters firing_rate_thresh:{wildcards.fr} isolation_thresh:{wildcards.iso} noise_overlap_thresh:{wildcards.noi} peak_snr_thresh:{wildcards.snr}

        ml-run-process ms4alg.apply_label_map \
            --inputs firings:$TEMPDIR/ml_label_map_firings.mda label_map:$TEMPDIR/ml_label_map_out.mda \
            --outputs firings_out:$TEMPDIR/ml_label_map_curated_firings.mda \
            --parameters

        python {input.script} $TEMPDIR/ml_label_map_curated_firings.npy

        cp $TEMPDIR/ml_label_map_curated_firings.npy {output.npy}
        cp $TEMPDIR/ml_label_map_curated_firings.mda {output.mda}

        exitfunction

        trap - TERM

The slightly strange formatting is due to the wildcards system that snakemake follows. It fills in the wildcard entries automatically for different values that I request, and runs this script for each such such parameter set. This is an example of the create_label_map+apply_label_map step. A very similar script is also deployed for the sort step, and also yields the same error.

This error appears randomly, that is, if I run the script for the same configuration (same parameter set, for instance) again, it doesn't necessarily reappear. The conda environment that snakemake creates and uses has the following packages installed:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
alabaster                 0.7.12                     py_0    conda-forge
asn1crypto                0.24.0                py36_1003    conda-forge
attrs                     19.1.0                     py_0    conda-forge
babel                     2.7.0                      py_0    conda-forge
backcall                  0.1.0                      py_0    conda-forge
bleach                    3.1.0                      py_0    conda-forge
blosc                     1.16.3               he1b5a44_1    conda-forge
bzip2                     1.0.6             h14c3975_1002    conda-forge
ca-certificates           2019.6.16            hecc5488_0    conda-forge
certifi                   2019.6.16                py36_0    conda-forge
cffi                      1.12.3           py36h8022711_0    conda-forge
chardet                   3.0.4                 py36_1003    conda-forge
cryptography              2.7              py36h72c5cf5_0    conda-forge
cycler                    0.10.0                     py_1    conda-forge
dbus                      1.13.6               he372182_0    conda-forge
decorator                 4.4.0                      py_0    conda-forge
deepdish                  0.3.4                    py36_1    flatiron
defusedxml                0.5.0                      py_1    conda-forge
docopt                    0.6.2                      py_1    conda-forge
docutils                  0.14                  py36_1001    conda-forge
entrypoints               0.3                   py36_1000    conda-forge
expat                     2.2.5             he1b5a44_1003    conda-forge
fftw                      3.3.8           nompi_h7f3a6c3_1106    conda-forge
fontconfig                2.13.1            he4413a7_1000    conda-forge
freetype                  2.10.0               he983fc9_0    conda-forge
gettext                   0.19.8.1          hc5be6a0_1002    conda-forge
glib                      2.58.3            h6f030ca_1002    conda-forge
gst-plugins-base          1.14.5               h0935bb2_0    conda-forge
gstreamer                 1.14.5               h36ae1b5_0    conda-forge
h5py                      2.9.0           nompi_py36hf008753_1102    conda-forge
hdf5                      1.10.4          nompi_h3c11f04_1106    conda-forge
icu                       58.2              hf484d3e_1000    conda-forge
idna                      2.8                   py36_1000    conda-forge
imagesize                 1.1.0                      py_0    conda-forge
ipykernel                 5.1.1            py36h24bf2e0_0    conda-forge
ipython                   7.6.1            py36h5ca1d4c_0    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
ipywidgets                7.5.0                      py_0    conda-forge
isosplit5                 0.1.3            py36h6bb024c_7    flatiron
jedi                      0.14.1                   py36_0    conda-forge
jinja2                    2.10.1                     py_0    conda-forge
joblib                    0.13.2                     py_0    conda-forge
jp-proxy-widget           1.0.0                    pypi_0    pypi
jpeg                      9c                h14c3975_1001    conda-forge
jsonschema                3.0.1                    py36_0    conda-forge
jupyter_client            5.3.1                      py_0    conda-forge
jupyter_core              4.4.0                      py_0    conda-forge
kiwisolver                1.1.0            py36hc9558a2_0    conda-forge
libblas                   3.8.0               10_openblas    conda-forge
libcblas                  3.8.0               10_openblas    conda-forge
libffi                    3.2.1             he1b5a44_1006    conda-forge
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
libiconv                  1.15              h516909a_1005    conda-forge
liblapack                 3.8.0               10_openblas    conda-forge
libopenblas               0.3.6                h6e990d7_4    conda-forge
libpng                    1.6.37               hed695b0_0    conda-forge
libsodium                 1.0.17               h516909a_0    conda-forge
libstdcxx-ng              9.1.0                hdf63c60_0  
libuuid                   2.32.1            h14c3975_1000    conda-forge
libxcb                    1.13              h14c3975_1002    conda-forge
libxml2                   2.9.9                h13577e0_1    conda-forge
llvm-openmp               8.0.0                hc9558a2_0    conda-forge
lzo                       2.10              h14c3975_1000    conda-forge
markupsafe                1.1.1            py36h14c3975_0    conda-forge
matplotlib                3.1.1                    py36_0    conda-forge
matplotlib-base           3.1.1            py36hfd891ef_0    conda-forge
mistune                   0.8.4           py36h14c3975_1000    conda-forge
ml_ephys                  0.2.14                   py36_2    flatiron
ml_ms3                    0.2.4                h38c1b9e_1    flatiron
ml_ms4alg                 0.2.3            py36h6bb024c_0    flatiron
ml_pyms                   0.2.3            py36h6bb024c_1    flatiron
mock                      3.0.5                    py36_0    conda-forge
mountainlab               0.15.2                        0    flatiron
mountainlab-pytools       0.7.5                    pypi_0    pypi
mountainlab_pytools       0.7.5                    py36_2    flatiron
nbconvert                 5.5.0                      py_0    conda-forge
nbformat                  4.4.0                      py_1    conda-forge
ncurses                   6.1               hf484d3e_1002    conda-forge
neo                       0.7.1                    pypi_0    pypi
nixio                     1.5.0b2                  pypi_0    pypi
nodejs                    11.14.0              he1b5a44_1    conda-forge
notebook                  5.7.8                    py36_1    conda-forge
numexpr                   2.6.9           py36h637b7d7_1000    conda-forge
numpy                     1.16.4           py36h95a1406_0    conda-forge
numpydoc                  0.9.1                      py_0    conda-forge
openblas                  0.3.6                h6e990d7_4    conda-forge
openmp                    8.0.0                         0    conda-forge
openssl                   1.1.1c               h516909a_0    conda-forge
packaging                 19.0                       py_0    conda-forge
pandoc                    2.7.3                         0    conda-forge
pandocfilters             1.4.2                      py_1    conda-forge
parso                     0.5.0                      py_0    conda-forge
pbr                       5.4.0                    pypi_0    pypi
pcre                      8.41              hf484d3e_1003    conda-forge
pexpect                   4.7.0                    py36_0    conda-forge
pickleshare               0.7.5                 py36_1000    conda-forge
pip                       19.1.1                   py36_0    conda-forge
prometheus_client         0.7.1                      py_0    conda-forge
prompt_toolkit            2.0.9                      py_0    conda-forge
pthread-stubs             0.4               h14c3975_1001    conda-forge
ptyprocess                0.6.0                   py_1001    conda-forge
pycparser                 2.19                     py36_1    conda-forge
pygments                  2.4.2                      py_0    conda-forge
pyopenssl                 19.0.0                   py36_0    conda-forge
pyparsing                 2.4.0                      py_0    conda-forge
pyqt                      5.9.2            py36hcca6a23_0    conda-forge
pyrsistent                0.15.3           py36h516909a_0    conda-forge
pysocks                   1.7.0                    py36_0    conda-forge
pytables                  3.5.2            py36ha1aa75f_0    conda-forge
python                    3.6.7             h357f687_1005    conda-forge
python-dateutil           2.8.0                      py_0    conda-forge
pytz                      2019.1                     py_0    conda-forge
pyzmq                     18.0.2           py36hc4ba49a_1    conda-forge
qt                        5.9.7                h52cfd70_2    conda-forge
quantities                0.12.3                   pypi_0    pypi
readline                  8.0                  hf8c457e_0    conda-forge
requests                  2.22.0                   py36_1    conda-forge
scikit-learn              0.21.2           py36hcdab131_1    conda-forge
scipy                     1.3.0            py36h921218d_0    conda-forge
send2trash                1.5.0                      py_0    conda-forge
setuptools                41.0.1                   py36_0    conda-forge
sip                       4.19.8          py36hf484d3e_1000    conda-forge
six                       1.12.0                py36_1000    conda-forge
snowballstemmer           1.9.0                      py_0    conda-forge
sphinx                    2.1.2                      py_0    conda-forge
sphinxcontrib-applehelp   1.0.1                      py_0    conda-forge
sphinxcontrib-devhelp     1.0.1                      py_0    conda-forge
sphinxcontrib-htmlhelp    1.0.2                      py_0    conda-forge
sphinxcontrib-jsmath      1.0.1                      py_0    conda-forge
sphinxcontrib-qthelp      1.0.2                      py_0    conda-forge
sphinxcontrib-serializinghtml 1.1.1                      py_0    conda-forge
sqlite                    3.29.0               hcee41ef_0    conda-forge
stevedore                 1.30.1                   pypi_0    pypi
terminado                 0.8.2                    py36_0    conda-forge
testpath                  0.4.2                   py_1001    conda-forge
tk                        8.6.9             hed695b0_1002    conda-forge
tornado                   6.0.3            py36h516909a_0    conda-forge
traitlets                 4.3.2                 py36_1000    conda-forge
urllib3                   1.25.3                   py36_0    conda-forge
virtualenv                16.6.2                   pypi_0    pypi
virtualenv-clone          0.5.3                    pypi_0    pypi
virtualenvwrapper         4.8.4                    pypi_0    pypi
wcwidth                   0.1.7                      py_1    conda-forge
webencodings              0.5.1                      py_1    conda-forge
wheel                     0.33.4                   py36_0    conda-forge
widgetsnbextension        3.5.0                    py36_0    conda-forge
xorg-libxau               1.0.9                h14c3975_0    conda-forge
xorg-libxdmcp             1.1.3                h516909a_0    conda-forge
xz                        5.2.4             h14c3975_1001    conda-forge
zeromq                    4.3.2                he1b5a44_2    conda-forge
zlib                      1.2.11            h516909a_1005    conda-forge

Any clue what might be happening? Anything else you need for debugging this?

Thanks!