OpenSenseAction / OPENSENSE_sandbox

Collection of runable examples with software packages for processing opportunistic rainfall sensors
BSD 3-Clause "New" or "Revised" License
13 stars 16 forks source link

transform_andersson_2022_OpenMRG fails with "OSError: [Errno 28] No space left on device" on binder #64

Open cchwala opened 1 year ago

cchwala commented 1 year ago

The resources of the binder pods are limited. Running the data exploration notebook with the current version from #62 I get the following error this error message:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[15], line 1
----> 1 ds = oddt.transform_andersson_2022_OpenMRG(
      2     fn='data/andersson_2022_OpenMRG/OpenMRG.zip', 
      3     path_to_extract_to='data/andersson_2022_OpenMRG/',
      4     time_start_end = ('2015-08-27T00', '2015-08-28T00'), # default (None, None) -> no timeslicing. ie. ('2015-08-31T00', None),
      5     restructure_data=True,
      6 )

File ~/OPENSENSE_sandbox/notebooks/opensense_data_downloader_and_transformer.py:272, in transform_andersson_2022_OpenMRG(fn, path_to_extract_to, time_start_end, restructure_data)
    268 # For this ZIP file we cannot extract only the CML data since
    269 # the NetCDF with the CML data is quite large. This seems to
    270 # lead to crashes when reding directly from the ZIP file via Python.
    271 with zipfile.ZipFile(fn) as zfile:
--> 272     zfile.extractall(path_to_extract_to)
    274 # Read metadata and data
    275 df_metadata = pd.read_csv(os.path.join(path_to_extract_to,
    276                                        'cml/cml_metadata.csv'), 
    277                                         index_col=0)

File /srv/conda/envs/notebook/lib/python3.10/zipfile.py:1645, in ZipFile.extractall(self, path, members, pwd)
   1642     path = os.fspath(path)
   1644 for zipinfo in members:
-> 1645     self._extract_member(zipinfo, path, pwd)

File /srv/conda/envs/notebook/lib/python3.10/zipfile.py:1700, in ZipFile._extract_member(self, member, targetpath, pwd)
   1696     return targetpath
   1698 with self.open(member, pwd=pwd) as source, \
   1699      open(targetpath, "wb") as target:
-> 1700     shutil.copyfileobj(source, target)
   1702 return targetpath

File /srv/conda/envs/notebook/lib/python3.10/shutil.py:198, in copyfileobj(fsrc, fdst, length)
    196 if not buf:
    197     break
--> 198 fdst_write(buf)

OSError: [Errno 28] No space left on device

I am not sure if this is due to changes in the function transform_andersson_2022_OpenMRG, maybe introduced in #62, but since the error comes from zfile.extractall(path_to_extract_to) it seems to just stem from the fact that we exceed a disk quota set on the binder pod, which I do not know how to find out how large the quota is.

eoydvin commented 1 year ago

This is a bit strange, the new transformer code should not use that much more memory. Worst case if we do not find a way to solve this we could use the old transformation code, that is now working, do restructure the data. As mentioned in #62 it could be just the binder environment. One think to check is if both transform_andersson_2022_OpenMRG and transform_andersson_2022_OpenMRG_linkbylink crashes for the same environment.

cchwala commented 1 year ago

Since it comes from the extraction process it looks to me like a disk space issue, which in the case of a binder pod is most likely a disk quota issue.

cchwala commented 1 year ago

I just ran it again after your latest commit in #62 and it worked without this error... Like 1 hour ago I was on a small binder pod from ovh with 2GB of RAM. Maybe the available disk space depends on how many other pods run on the same server or on how much the other uses have already used up disk space...

Maybe this problem will not occur often enough to work on a solution. If so, we could just add an info text somewhere.