ValueError: bytes object is too large when using .sync_cube to save a trend stack

georgeboldeanu commented 1 month ago

Hello! After i compute the trend regression and try to save it with .sync_cube function I receive a strange error in dask client.

Traceback (most recent call last):
  File "/home/gebo/py_venv/cophil/lib/python3.10/site-packages/distributed/protocol/core.py", line 109, in dumps
    frames[0] = msgpack.dumps(msg, default=_encode_default, use_bin_type=True)

File "/home/gebo/py_venv/cophil/lib/python3.10/site-packages/msgpack/__init__.py", line 36, in packb
    return Packer(**kwargs).pack(o)

File "msgpack/_packer.pyx", line 294, in msgpack._cmsgpack.Packer.pack
File "msgpack/_packer.pyx", line 300, in msgpack._cmsgpack.Packer.pack
File "msgpack/_packer.pyx", line 297, in msgpack._cmsgpack.Packer.pack
File "msgpack/_packer.pyx", line 264, in msgpack._cmsgpack.Packer._pack
File "msgpack/_packer.pyx", line 231, in msgpack._cmsgpack.Packer._pack
File "msgpack/_packer.pyx", line 264, in msgpack._cmsgpack.Packer._pack
File "msgpack/_packer.pyx", line 202, in msgpack._cmsgpack.Packer._pack

ValueError: bytes object is too large
2024-05-24 10:44:56,253 - distributed.comm.utils - ERROR - bytes object is too large
Traceback (most recent call last):
  File "/home/gebo/py_venv/cophil/lib/python3.10/site-packages/distributed/comm/utils.py", line 34, in _to_frames
    return list(protocol.dumps(msg, **kwargs))
  File "/home/gebo/py_venv/cophil/lib/python3.10/site-packages/distributed/protocol/core.py", line 109, in dumps
    frames[0] = msgpack.dumps(msg, default=_encode_default, use_bin_type=True)
  File "/home/gebo/py_venv/cophil/lib/python3.10/site-packages/msgpack/__init__.py", line 36, in packb
    return Packer(**kwargs).pack(o)
  File "msgpack/_packer.pyx", line 294, in msgpack._cmsgpack.Packer.pack
  File "msgpack/_packer.pyx", line 300, in msgpack._cmsgpack.Packer.pack
  File "msgpack/_packer.pyx", line 297, in msgpack._cmsgpack.Packer.pack
  File "msgpack/_packer.pyx", line 264, in msgpack._cmsgpack.Packer._pack
  File "msgpack/_packer.pyx", line 231, in msgpack._cmsgpack.Packer._pack
  File "msgpack/_packer.pyx", line 264, in msgpack._cmsgpack.Packer._pack
  File "msgpack/_packer.pyx", line 202, in msgpack._cmsgpack.Packer._pack
ValueError: bytes object is too large
2024-05-24 10:44:56,255 - distributed.batched - ERROR - Error in batched write
Traceback (most recent call last):
  File "/home/gebo/py_venv/cophil/lib/python3.10/site-packages/distributed/batched.py", line 115, in _background_send
    nbytes = yield coro
  File "/home/gebo/py_venv/cophil/lib/python3.10/site-packages/tornado/gen.py", line 767, in run
    value = future.result()
  File "/home/gebo/py_venv/cophil/lib/python3.10/site-packages/distributed/comm/tcp.py", line 264, in write
    frames = await to_frames(
  File "/home/gebo/py_venv/cophil/lib/python3.10/site-packages/distributed/comm/utils.py", line 48, in to_frames
    return await offload(_to_frames)
  File "/home/gebo/py_venv/cophil/lib/python3.10/site-packages/distributed/utils.py", line 1540, in run_in_executor_with_context
    return await loop.run_in_executor(
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/gebo/py_venv/cophil/lib/python3.10/site-packages/distributed/utils.py", line 1541, in <lambda>
    executor, lambda: context.run(func, *args, **kwargs)
  File "/home/gebo/py_venv/cophil/lib/python3.10/site-packages/distributed/comm/utils.py", line 34, in _to_frames
    return list(protocol.dumps(msg, **kwargs))
  File "/home/gebo/py_venv/cophil/lib/python3.10/site-packages/distributed/protocol/core.py", line 109, in dumps
    frames[0] = msgpack.dumps(msg, default=_encode_default, use_bin_type=True)
  File "/home/gebo/py_venv/cophil/lib/python3.10/site-packages/msgpack/__init__.py", line 36, in packb
    return Packer(**kwargs).pack(o)
  File "msgpack/_packer.pyx", line 294, in msgpack._cmsgpack.Packer.pack
  File "msgpack/_packer.pyx", line 300, in msgpack._cmsgpack.Packer.pack
  File "msgpack/_packer.pyx", line 297, in msgpack._cmsgpack.Packer.pack
  File "msgpack/_packer.pyx", line 264, in msgpack._cmsgpack.Packer._pack
  File "msgpack/_packer.pyx", line 231, in msgpack._cmsgpack.Packer._pack
  File "msgpack/_packer.pyx", line 264, in msgpack._cmsgpack.Packer._pack
  File "msgpack/_packer.pyx", line 202, in msgpack._cmsgpack.Packer._pack
ValueError: bytes object is too large

From my basic understanding and some google searches its something related to file dimensions. But this behavior persists even when I try to save just the first pair from the stack. The stack is composed from 53x11138x7257 with a dimension per pair with 300mb and got a total dimension of 16 GBs. So, i don't think that the issues is directly related to filesizes.

Somehow, I can tackle this?

System and software version:

Server Specifications: 125 GB RAM, CPU: Intel(R) Xeon(R) Silver 4309Y CPU @ 2.80GHz OS: Ubuntu 22.04.2 LTS (GNU/Linux 6.5.0-28-generic x86_64) PyGMTSAR version: 2024.4.17.post2

AlexeyPechnikov commented 1 month ago

It seems you are trying to send a large non-Dask object to the calculation function. While we can mix lazy Dask and non-Dask arrays, it is ineffective to use a large non-Dask array in this way and is therefore prohibited.

georgeboldeanu commented 1 month ago

@AlexeyPechnikov I am using this part from your GoldenVallley example for a SBAS analysis:

decimator = sbas.decimator(resolution=15, grid=(1,1))

topo = decimator(sbas.get_topo())

inc = decimator(sbas.incidence_angle())
yy, xx = xr.broadcast(topo.y, topo.x)
trend_sbas = sbas.regression(unwrap_sbas.phase,
        [topo,    topo*yy,    topo*xx,    topo*yy*xx,
         topo**2, topo**2*yy, topo**2*xx, topo**2*yy*xx,
         topo**3, topo**3*yy, topo**3*xx, topo**3*yy*xx,
         inc,     inc**yy,    inc*xx,     inc*yy*xx,
         yy, xx,
         yy**2, xx**2, yy*xx,
         yy**3, xx**3, yy**2*xx, xx**2*yy], corr_sbas)

I don't do anything extra, just reindexing the topo and incidence, because they are missing one row, on column. If I slice the first pair with some indexing along rows and cols, some very small subset (100x100 array) it works. Is there any way how can this be tackled with it?

AlexeyPechnikov commented 1 month ago

Are your unwrap_sbas.phase and corr_sbas lazy Dask grids?

georgeboldeanu commented 1 month ago

WhenI verify if they are lazy Dask grids, from my understanding dask.Arrays, i get False statements for both of them. From your answer I understand that they must be Dask Arrays.

import dask.array as da
list=[corr_sbas, unwrap_sbas.phase]

for i, array in enumerate(list):
    is_dask = isinstance(array, da.Array)
    print(f"Array {i} is a Dask lazy array: {is_dask}")

Array 0 is a Dask lazy array: False
Array 1 is a Dask lazy array: False

The problem is that those are previously saved interferograms files from where i load the correlation and unwrapped files opened with open_stack and open_cube, like in your colabs. Is that the issue? When I verify the file, it seems ok, chunked with the new grid chunks that I set at the beggining (also the chunks that were the files processed at first).

github

AlexeyPechnikov commented 1 month ago

The check should be

isinstance(array.data, dask.array.Array)

where array is an xarray object, array.data is a Dask lazy object, and array.values is a numpy object. Your unwrap_sbas.phase looks okay; how about corr_sbas?

georgeboldeanu commented 1 month ago

Now when I verify the phase and the correlation with isinstance(array.data, dask.array.Array) the statements are returned as True. The corr_sbas looks like this:

Also, some info about the topography and incidence angle, just in case. topo_ic

AlexeyPechnikov commented 1 month ago

Right now, I see your chunks are extremely small (512x512) instead of the default ones (2048x2048). It seems you made some wrong modifications in the processing and broke it.

georgeboldeanu commented 1 month ago

Hello @AlexeyPechnikov! I came back with update on this issue. The problem is still persisting after processing the data with the original chunks, the one with 2048. I am really stuck, I tried clean installations and so on. My guess is that the chunks are distributed in a small amount of graphs in the unwrap_sbas1.phase cube, like you can see in the pictures attached. corr phase

AlexeyPechnikov commented 1 month ago

A data cube is stored as a single large NetCDF file, while a data stack consists of separate files. This difference accounts for the variation in graph layers. In your case, you can try filtering out noisy pixels to simplify the regression calculation, for example, using corr_sbas.where(corr_sbas > 0.5).

AlexeyPechnikov commented 1 month ago

Running the example LakeSarez_Landslides_2017.ipynb with BUFFER = 0.7, we have a grid size similar to yours, and there are no problems calling the regression:

georgeboldeanu commented 1 month ago

You are right, using the regression formula from the LakeSarez_Landslides_2017.ipynb example runs smoothly and without any problems. But now my question is related about the regression formula choosing. Is there any logic behind the regression from LakeSarez vs the one for GoldenValley example that took into account the use of incidence angle and also aditional cubic elements of topography and spatial dimensions. Because as I stated, the regression that was more complex thrown that error for my case, expecially for this bigger cube.

AlexeyPechnikov commented 1 month ago

Potentially, we can better remove phase ramps and stratified components using high-degree regression variables (remember Taylor decomposition of any function). Non-linear function approximation may require low or high-degree polynomials; some papers even use 4th-degree polynomials, although I find it excessive for everyday usage. At the same time, more regression variables, especially high-degree ones, can cause overfitting. Performance reasons also discourage using too many variables. There’s a trade-off between the benefits of higher-degree variables and a shorter list of variables. Regarding incidence angle fitting, it’s tricky and often ignorable, but it can be done if you really need it.

AlexeyPechnikov commented 1 month ago

A separate consideration about your raster: tiles processed independently can have inconsistencies between them. The best approach is to use a single tile, tuning the grid resolution for sufficient detail while maintaining processable sizes. Although PyGMTSAR can handle large grids, most practical cases don’t require it. Phase accuracy improves with lower resolution (multilooking) processing, and SNAPHU unwrapping performs better on lower resolution grids. Typically, SBAS analysis resolution is 60 meters due to strong physical reasons. For high-resolution results, use SBAS for the initial solution and apply phase corrections for precise PSI analysis with Sentinel-1 resolution, combining the strengths of both SBAS and PSI approaches.

georgeboldeanu commented 4 weeks ago

Hello, again Alexey! When trying to use PSI analysis for the same area, this time with interferograms computed like this, from Otmanbozdagh example:

.compute_interferogram_singlelook(baseline_pairs, 'intf_slook', wavelength=60, weight=sbas.psfunction(), phase=trend_sbas + turbo_sbas + turbo2_sbas)

i get the following error when I apply the 1D unwrapping. Even when I apply it for a small slice, in the temporal and spatial domaain, I get the following error:

2024-06-03 16:43:52,294 - distributed.worker - WARNING - Compute Failed
Key:       ('rechunk-merge-rechunk-split-vectorize_unwrap_pairs-vectorize_unwrap_pairs_0-transpose-0351055b9376dfff369df2e3507f3--6de', 0, 7, 11)
Function:  execute_task
args:      ((subgraph_callable-5fb5c1bd203aa36396e14cbae36a56e2, (subgraph_callable-1168c7f8bfb4c1bbc32e80c78965f4ee, (<function concatenate_axes at 0x77c041ab1480>, [(subgraph_callable-5f5c4edafe595198229703d8bf6980de, (<function getitem at 0x77c04853ad40>, array([[[-1.28743770e-02,  3.25911194e-02,  1.04233131e-01, ...,
          3.26218009e-02,  1.36502340e-01,  1.83258280e-01],
        [-2.95294281e-02, -1.69400778e-02,  2.38598846e-02, ...,
          8.82457420e-02,  2.91690975e-01,  4.03565377e-01],
        [-1.39907554e-01, -1.45749554e-01, -1.08706944e-01, ...,
         -1.73812658e-01,  1.12226687e-01,  3.27572137e-01],
        ...,
        [ 8.60467553e-03, -5.24168946e-02, -1.16339408e-01, ...,
         -5.53988993e-01, -5.90708256e-01, -3.57140332e-01],
        [-7.14451447e-02, -1.44769415e-01, -2.59160727e-01, ...,
         -1.03608274e+00, -9.29100156e-01, -7.40349352e-01],
        [ 6.69665262e-02, -1.07811457e-02, -6.05199225e-02, ...,
         -1.59755516e+00, -1.21419084e+00, -
kwargs:    {}
Exception: "ValueError('could not broadcast input array from shape (2,) into shape (5,)')"

AlexeyPechnikov commented 3 weeks ago

Hmm, such reshaping applying for geocoding only. It can be an error raised before.

georgeboldeanu commented 3 weeks ago

The only thing that I did was that the sbas.psfunction() is not filtered based on the correlation stack, but the phase component is filtered (every component in part is filtered on some threshold). Is this the cause of such behavior? The other thing that comes in my mind that could have implications into processing is the wavelength of Gaussian filter. In the initial multilook interferogram processing I use a wavelength of 200 (I saw some indications of yours that the value should be around that) and in the singlelook I use a 60 (a smaller Gaussian filter window for PS Analysis). This last part should have any implications in such an error?

AlexeyPechnikov commented 3 weeks ago

Such error usually rises when we use incomplete DEM. If you sure the problem is related to 1D unwrapping try to use more pairs.

AlexeyPechnikov / pygmtsar

ValueError: bytes object is too large when using .sync_cube to save a trend stack #134