ROBelgium / MSNoise

A Python Package for Monitoring Seismic Velocity Changes using Ambient Seismic Noise | http://www.msnoise.org
European Union Public License 1.1
176 stars 83 forks source link

question regarding Reference Stack in MSNoise Master #378

Open seismolab-uct opened 1 day ago

seismolab-uct commented 1 day ago

Hi, I am looking into generating a reference stack with msnoise -t 8 cc stack -r, and the error I get has been reproduced in issue #339, which is caused by the setting "keep all = N". I was not aware that "keep all = Y" is compulsory in the dev version, and my original thinking was that I will save some disk space by using "keep all = N".

I think at the moment my only option is to start over with the cross-correlations by resetting all stacks and all cross-correlation jobs and setting "keep all = Y". Could someone let me know if indeed that is my only option? My dataset is relatively large (3TB) and my STACKS folder is already 23GB, I am guessing the CROSS_CORRELATIONS directory will be much larger than 23 GB and makes me worry about disk space (3.7TB).

I would appreciate some guidance, thanks.

seismolab-uct commented 11 hours ago

As I suspected, after resetting all my stack jobs and all my cross-correlation jobs and changing the setting to "keep all = Y" my msnoise -t 8 cc compute_cc exited with error:

2024-10-30 23:16:15.591004 msnoise [pid 393605][INFO]: Finished preprocessing
2024-10-30 23:16:22.451574 msnoise [pid 330386][INFO]: Received preprocessed traces
Process Process-2:
Traceback (most recent call last):
  File "/home/seismolab/Software/anaconda3/envs/msnoise/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/seismolab/Software/anaconda3/envs/msnoise/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/seismolab/Software/anaconda3/envs/msnoise/lib/python3.12/site-packages/msnoise/s03compute_no_rotation.py", line 620, in main
    export_allcorr2(db, ccfid, allcorr[ccfid])
  File "/home/seismolab/Software/anaconda3/envs/msnoise/lib/python3.12/site-packages/msnoise/api.py", line 1151, in export_allcorr2
    df.to_hdf(os.path.join(path, date+'.h5'), key='data')
  File "/home/seismolab/Software/anaconda3/envs/msnoise/lib/python3.12/site-packages/pandas/util/_decorators.py", line 333, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/seismolab/Software/anaconda3/envs/msnoise/lib/python3.12/site-packages/pandas/core/generic.py", line 2855, in to_hdf
    pytables.to_hdf(
  File "/home/seismolab/Software/anaconda3/envs/msnoise/lib/python3.12/site-packages/pandas/io/pytables.py", line 311, in to_hdf
    f(store)
  File "/home/seismolab/Software/anaconda3/envs/msnoise/lib/python3.12/site-packages/pandas/io/pytables.py", line 293, in <lambda>
    f = lambda store: store.put(
                      ^^^^^^^^^^
  File "/home/seismolab/Software/anaconda3/envs/msnoise/lib/python3.12/site-packages/pandas/io/pytables.py", line 1160, in put
    self._write_to_group(
  File "/home/seismolab/Software/anaconda3/envs/msnoise/lib/python3.12/site-packages/pandas/io/pytables.py", line 1858, in _write_to_group
    s.write(
  File "/home/seismolab/Software/anaconda3/envs/msnoise/lib/python3.12/site-packages/pandas/io/pytables.py", line 3333, in write
    self.write_array(f"block{i}_values", blk.values, items=blk_items)
  File "/home/seismolab/Software/anaconda3/envs/msnoise/lib/python3.12/site-packages/pandas/io/pytables.py", line 3198, in write_array
    self._handle.create_array(self.group, key, value)
  File "/home/seismolab/Software/anaconda3/envs/msnoise/lib/python3.12/site-packages/tables/file.py", line 1142, in create_array
    return Array(parentnode, name,
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/seismolab/Software/anaconda3/envs/msnoise/lib/python3.12/site-packages/tables/array.py", line 186, in __init__
    super().__init__(parentnode, name, new, Filters(), byteorder, _log,
  File "/home/seismolab/Software/anaconda3/envs/msnoise/lib/python3.12/site-packages/tables/leaf.py", line 350, in __init__
    super().__init__(parentnode, name, _log)
  File "/home/seismolab/Software/anaconda3/envs/msnoise/lib/python3.12/site-packages/tables/node.py", line 256, in __init__
    self._v_objectid = self._g_create()
                       ^^^^^^^^^^^^^^^^
  File "/home/seismolab/Software/anaconda3/envs/msnoise/lib/python3.12/site-packages/tables/array.py", line 218, in _g_create
    (self._v_objectid, self.shape, self.atom) = self._create_array(
                                                ^^^^^^^^^^^^^^^^^^^
  File "tables/hdf5extension.pyx", line 1416, in tables.hdf5extension.Array._create_array
tables.exceptions.HDF5ExtError: Problems creating the Array.

I ran out of space in the disk, the Cross_Correlations directory (in my case named Xcorrs) is over 740 GB and there are still a good number of cross_correlation jobs to compute (541669 CC jobs in the database: 23841 todo, 126443 in progress and 391385 done).

Is it possible to keep the Data Folder on another disk or is the database expecting it to be under the same project folder were the db.ini file is?

ThomasLecocq commented 7 hours ago

Hi, if you don't plan to use subdaily stacks, it'd be possible to use what @LaureBrenot prepared here: https://github.com/ROBelgium/MSNoise/pull/363

This is: not expecting the keep_all=Y, and build the ref & mov stacks from the DAY stacks

ThomasLecocq commented 7 hours ago

re: moving the stuff:, it's possible to move the whole project at once to another disk (or since you're on linux: move the CROSS_CORRELATION directory elsewhere & make a symbolic link to it)

seismolab-uct commented 6 hours ago

Ok thanks Thomas, I'll try moving it first and creating a symbolic link.