geodesymiami / insarmaps

3 stars 0 forks source link

Memory error from hdfeos5_2json_mbtiles.py for large files #111

Open falkamelung opened 1 month ago

falkamelung commented 1 month ago

As described below, I get memory errors while trying to ingest big data files. Is there any way to reduce the memory requirements? If not, it would be good to display the limitations, e.g. how are the memory requirements calculated and, for a system with 64GB RAM, what is the maximum permitted file size?

The number of dates may be important. I believe I have previously successfully ingested larger files but with less dates.

I tried to ingest a 21GB data set (205 dates) using

hdfeos5_2json_mbtiles.py miaplpy_201505_202409_0.5/network_delaunay_4/S1_IW12_120_1183_1185_20150505_20240926_N00600_N00890_W078090_W077800_filtDel4DS.he5 miaplpy_201505_202409_0.5/network_delaunay_4/JSON_filtDS2 --num-workers 8

but got a memory error:

cat insarmaps_1131392.e
Process ForkPoolWorker-3:
Traceback (most recent call last):
  File "/work2/05861/tg851601/stampede2/code/rsmas_insar/tools/miniforge3/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/work2/05861/tg851601/stampede2/code/rsmas_insar/tools/miniforge3/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/work2/05861/tg851601/stampede2/code/rsmas_insar/tools/miniforge3/lib/python3.10/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/work2/05861/tg851601/stampede2/code/rsmas_insar/tools/miniforge3/lib/python3.10/multiprocessing/queues.py", line 367, in get
    return _ForkingPickler.loads(res)
MemoryError
Process ForkPoolWorker-4:
Traceback (most recent call last):
  File "/work2/05861/tg851601/stampede2/code/rsmas_insar/tools/miniforge3/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/work2/05861/tg851601/stampede2/code/rsmas_insar/tools/miniforge3/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/work2/05861/tg851601/stampede2/code/rsmas_insar/tools/miniforge3/lib/python3.10/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/work2/05861/tg851601/stampede2/code/rsmas_insar/tools/miniforge3/lib/python3.10/multiprocessing/queues.py", line 367, in get
    return _ForkingPickler.loads(res)
MemoryError

I was on a really big machine (250 GB RAM). It came pretty far until the error occurred:

tail -5 insarmaps_1131392.o
converted chunk 889
converted chunk 894
converted chunk 899
converted chunk 905
converted chunk 91
               total        used        free      shared  buff/cache   available
Mem:           250Gi       207Gi        44Gi        20Gi        21Gi        43Gi
Swap:             0B          0B          0B
stackTom commented 1 month ago

Can we move this as an issue to https://github.com/geodesymiami/insarmaps_scripts/tree/master? In short, this error is thrown by python when it runs out of memory. Are you sure your python installation is 64 bits? That should eliminate all memory errors on a machine with so much RAM. If you are using 32 bit python, it is limited to 4 gigs of ram no matter how much ram your computer has.

If you are on 64 bits, please do a free -g when the script is running, not after it crashes. Also, please try with less num-workers

falkamelung commented 1 month ago

Yes, I am on 64-bit python:

Python 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:07:37) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform; print(f"You are using a {platform.architecture()[0]} version of Python.")
You are using a 64bit version of Python.

I did the free -h while running. It slowly moved up. It reached 200GB after 15 minutes or so. But it does not continuously move up. It goes every minute or so a bit down and then it goes ~20GB higher. I missed just prior to 250GB I suppose.

stackTom commented 1 month ago

Try running with a smaller num_workers. I don't have as much fine grained control over the memory usage, as we use GDAL to upload the data to the server.

falkamelung commented 1 month ago

That works indeed. I did --num-workers 4. Great! Only it is slower. Do you have any idea how to roughly calculate the memory requirement for a file with a given number of pixels and dates, depending on the number of workers? We could speed thing up by optimally using the available memory.

stackTom commented 1 month ago

I am not sure. I would have to perform some tests to see.