geodesymiami / rsmas_insar

RSMAS InSAR code
https://rsmas-insar.readthedocs.io/
GNU General Public License v3.0
59 stars 23 forks source link

MintPy dask error for too many workers #485

Open falkamelung opened 3 years ago

falkamelung commented 3 years ago

KokoxiliBigChunk30SenAT41

failed useing 40 workers but worked with 32 (need to add ifgram_inversion.py command with 32 and 40 workers to better demonstrate failure)

ifgram_inversion.py /scratch/05861/tg851601/KokoxiliBigChunk30SenAT41/mintpy/inputs/ifgramStack.h5 -t /scratch/05861/tg851601/KokoxiliBigChunk30SenAT41/mintpy/smallbaselineApp.cfg --update
cat smallbaseline_wrapper_7762111.e
distributed.worker - WARNING -  Compute Failed
Function:  ifgram_inversion_patch
args:      ()
kwargs:    {'ifgram_file': '/scratch/05861/tg851601/KokoxiliBigChunk30SenAT41/mintpy/inputs/ifgramStack.h5', 'ref_phase': array([-5.19649506e+00, -3.25060034e+00,  2.15256667e+00, -2.20532656e+00,
       -4.36168528e+00,  1.11639941e+00, -3.24188375e+00, -1.10148215e+00,
        5.41384649e+00,  1.06606722e+00,  3.24310565e+00, -1.78112912e+00,
       -4.32403851e+00, -2.17794132e+00, -7.14503670e+00, -2.97046185e+00,
        2.22950864e+00, -2.76462269e+00,  1.34797001e+00, -3.47990417e+00,
       -5.00010490e+00, -7.38510132e-01,  4.16993916e-01, -4.29824018e+00,
        4.23945141e+00, -7.89008856e-01,  6.73024356e-01, -1.44387579e+00,
       -4.98539352e+00, -3.63944149e+00, -5.56924820e+00, -4.86125517e+00,
        1.52328527e+00, -7.09312201e-01,  2.10004941e-01, -1.40527773e+00,
       -2.05830598e+00,  5.10107708e+00,  3.46001244e+00,  1.58456004e+00,
        7.69222558e-01, -7.87580729e-01, -2.37778068e+00, -6.80386245e-01,
       -1.57424474e+00, -3.14160013e+00, -1.46134782e+00, -2.464
Exception: ValueError('negative dimensions are not allowed')

distributed.worker - WARNING -  Compute Failed
Function:  ifgram_inversion_patch
args:      ()
kwargs:    {'ifgram_file': '/scratch/05861/tg851601/KokoxiliBigChunk30SenAT41/mintpy/inputs/ifgramStack.h5', 'ref_phase': array([-5.19649506e+00, -3.25060034e+00,  2.15256667e+00, -2.20532656e+00,
       -4.36168528e+00,  1.11639941e+00, -3.24188375e+00, -1.10148215e+00,
        5.41384649e+00,  1.06606722e+00,  3.24310565e+00, -1.78112912e+00,
       -4.32403851e+00, -2.17794132e+00, -7.14503670e+00, -2.97046185e+00,
        2.22950864e+00, -2.76462269e+00,  1.34797001e+00, -3.47990417e+00,
       -5.00010490e+00, -7.38510132e-01,  4.16993916e-01, -4.29824018e+00,
        4.23945141e+00, -7.89008856e-01,  6.73024356e-01, -1.44387579e+00,
       -4.98539352e+00, -3.63944149e+00, -5.56924820e+00, -4.86125517e+00,
        1.52328527e+00, -7.09312201e-01,  2.10004941e-01, -1.40527773e+00,
       -2.05830598e+00,  5.10107708e+00,  3.46001244e+00,  1.58456004e+00,
        7.69222558e-01, -7.87580729e-01, -2.37778068e+00, -6.80386245e-01,
       -1.57424474e+00, -3.14160013e+00, -1.46134782e+00, -2.464
Exception: ValueError('negative dimensions are not allowed')

distributed.worker - WARNING -  Compute Failed
Function:  ifgram_inversion_patch
args:      ()
kwargs:    {'ifgram_file': '/scratch/05861/tg851601/KokoxiliBigChunk30SenAT41/mintpy/inputs/ifgramStack.h5', 'ref_phase': array([-5.19649506e+00, -3.25060034e+00,  2.15256667e+00, -2.20532656e+00,
       -4.36168528e+00,  1.11639941e+00, -3.24188375e+00, -1.10148215e+00,
        5.41384649e+00,  1.06606722e+00,  3.24310565e+00, -1.78112912e+00,
       -4.32403851e+00, -2.17794132e+00, -7.14503670e+00, -2.97046185e+00,
        2.22950864e+00, -2.76462269e+00,  1.34797001e+00, -3.47990417e+00,
       -5.00010490e+00, -7.38510132e-01,  4.16993916e-01, -4.29824018e+00,
        4.23945141e+00, -7.89008856e-01,  6.73024356e-01, -1.44387579e+00,
       -4.98539352e+00, -3.63944149e+00, -5.56924820e+00, -4.86125517e+00,
        1.52328527e+00, -7.09312201e-01,  2.10004941e-01, -1.40527773e+00,
       -2.05830598e+00,  5.10107708e+00,  3.46001244e+00,  1.58456004e+00,
        7.69222558e-01, -7.87580729e-01, -2.37778068e+00, -6.80386245e-01,
       -1.57424474e+00, -3.14160013e+00, -1.46134782e+00, -2.464
Exception: ValueError('negative dimensions are not allowed')

distributed.worker - WARNING -  Compute Failed
Function:  ifgram_inversion_patch
args:      ()
kwargs:    {'ifgram_file': '/scratch/05861/tg851601/KokoxiliBigChunk30SenAT41/mintpy/inputs/ifgramStack.h5', 'ref_phase': array([-5.19649506e+00, -3.25060034e+00,  2.15256667e+00, -2.20532656e+00,
       -4.36168528e+00,  1.11639941e+00, -3.24188375e+00, -1.10148215e+00,
        5.41384649e+00,  1.06606722e+00,  3.24310565e+00, -1.78112912e+00,
       -4.32403851e+00, -2.17794132e+00, -7.14503670e+00, -2.97046185e+00,
        2.22950864e+00, -2.76462269e+00,  1.34797001e+00, -3.47990417e+00,
       -5.00010490e+00, -7.38510132e-01,  4.16993916e-01, -4.29824018e+00,
        4.23945141e+00, -7.89008856e-01,  6.73024356e-01, -1.44387579e+00,
       -4.98539352e+00, -3.63944149e+00, -5.56924820e+00, -4.86125517e+00,
        1.52328527e+00, -7.09312201e-01,  2.10004941e-01, -1.40527773e+00,
       -2.05830598e+00,  5.10107708e+00,  3.46001244e+00,  1.58456004e+00,
        7.69222558e-01, -7.87580729e-01, -2.37778068e+00, -6.80386245e-01,
       -1.57424474e+00, -3.14160013e+00, -1.46134782e+00, -2.464
Exception: ValueError('negative dimensions are not allowed')

Traceback (most recent call last):
  File "/tmp/rsmas_insar/sources/MintPy/mintpy/smallbaselineApp.py", line 1255, in <module>
    main(sys.argv[1:])
  File "/tmp/rsmas_insar/sources/MintPy/mintpy/smallbaselineApp.py", line 1237, in main
    app.run(steps=inps.runSteps)
  File "/tmp/rsmas_insar/sources/MintPy/mintpy/smallbaselineApp.py", line 1044, in run
    self.run_network_inversion(sname)
  File "/tmp/rsmas_insar/sources/MintPy/mintpy/smallbaselineApp.py", line 555, in run_network_inversion
    mintpy.ifgram_inversion.main(iargs)
  File "/tmp/rsmas_insar/sources/MintPy/mintpy/ifgram_inversion.py", line 1193, in main
    ifgram_inversion(inps)
  File "/tmp/rsmas_insar/sources/MintPy/mintpy/ifgram_inversion.py", line 1126, in ifgram_inversion
    ts, inv_quality, num_inv_ifg = cluster_obj.run(func=ifgram_inversion_patch,
  File "/tmp/rsmas_insar/sources/MintPy/mintpy/objects/cluster.py", line 198, in run
    return self.collect_result(futures, results, box, submission_time)
  File "/tmp/rsmas_insar/sources/MintPy/mintpy/objects/cluster.py", line 245, in collect_result
    for future, sub_results in as_completed(futures, with_results=True):
  File "/tmp/rsmas_insar/3rdparty/miniconda3/lib/python3.8/site-packages/distributed/client.py", line 4332, in __next__
    return self._get_and_raise()
  File "/tmp/rsmas_insar/3rdparty/miniconda3/lib/python3.8/site-packages/distributed/client.py", line 4323, in _get_and_raise
    raise exc.with_traceback(tb)
  File "/tmp/rsmas_insar/sources/MintPy/mintpy/ifgram_inversion.py", line 821, in ifgram_inversion_patch
    mask = np.ones(num_pixel, np.bool_)
  File "/tmp/rsmas_insar/3rdparty/miniconda3/lib/python3.8/site-packages/numpy/core/numeric.py", line 203, in ones
    a = empty(shape, dtype, order)
ValueError: negative dimensions are not allowed