Open dwest77a opened 2 months ago
Hi Dan, short answer (because I'm going home!), try:
>>> f = cf.read(files, chunks=None)
>>> cf.write(f, 'cfa.nc', cfa=True)
I did this on your data on JASMIN and it worked OK.
Long answer and explanations to follow ...
Tried this exactly as you've stated but I still get the runtime error with netcdf. FYI I'm using netcdf4==1.7.1.post1
. I can add my whole conda package list here if needed. I'm off as well now!
Interesting. I was using netCDF4==1.6.5
when it worked fine, but I got a seg fault with 1.7.1.post1
>>> cf.environment(paths=False)
Platform: Linux-3.10.0-1160.114.2.el7.x86_64-x86_64-with-glibc2.17
HDF5 library: 1.12.2
netcdf library: 4.9.3-development
udunits2 library: ~/miniconda3/lib/libudunits2.so.0
esmpy/ESMF: not available
Python: 3.12.2
dask: 2024.7.0
netCDF4: 1.6.5
psutil: 5.9.8
packaging: 23.1
numpy: 1.26.4
scipy: 1.12.0
matplotlib: not available
cftime: 1.6.3
cfunits: 3.3.7
cfplot: not available
cfdm: 1.11.1.0
cf: 3.16.2
>>>
netCDF4==1.7.0
works for me, too, but I notice that 1.7.0 and 1.7.1 have both been yanked (https://pypi.org/project/netCDF4/#history), for some reasons. Could this be related to https://github.com/Unidata/netcdf4-python/issues/1343?
Couple of questions with the above, is the hdf5 library just installed with h5py
or does it require a non-python library to be installed? Otherwise I'll just fix the h5py and netCDF4 versions in my environment and make a note of it. Looks like the versions fall out of sync just because of a lack of coordination.
I've backdated netCDF4 to 1.6.5 and also adjusted my scipy and numpy versions to match yours as well. It looked like I was making progress because I had a file that appeared which was about 6MB, but after 4-5 minutes the process exited with the same error as before (Can't write aggregated variable...) and the file disappeared.
Note: Immediately rerunning this process only took 10 seconds to reach the same error so I think those 4-5 minutes were fetching the data (if that's even supposed to happen here?)
Hi Dan, I just defer to netCDF4 to install the correct and consistent netCDF-C and HDF5 libraries, and that has, for many years, just worked ....
Strange about your results - the write took ~1 minute for me. Are you using the C libraries installed by the python packages?
I haven't done any extra steps to install alternative C libraries so I would assume yes, although I wouldn't know how to check.
My current environment setup for reference
asciitree==0.3.3
binpacking==1.5.2
ceda-elasticsearch-client==0.0.1
certifi==2024.7.4
cftime==1.6.4
cfunits==3.3.7
click==8.1.7
cloudpickle==3.0.0
dask==2024.7.0
elastic-transport==8.13.1
elasticsearch==8.14.0
fasteners==0.19
h5py==3.11.0
kerchunk==0.2.5
locket==1.0.0
mypy-extensions==1.0.0
netcdf-flattener==1.2.0
netCDF4==1.6.5
numcodecs==0.12.1
numpy==1.26.4
pandas==2.2.2
partd==1.4.2
python-dateutil==2.9.0.post0
pytz==2024.1
PyYAML==6.0.1
rechunker==0.5.2
scipy==1.12.0
tabulate==0.9.0
toolz==0.12.1
tzdata==2024.1
ujson==5.10.0
zarr==2.18.2
-e git+ssh://git@github.com/NCAS-CMS/cf-python.git@ca69ad166109e1eba4d4fb816af41b8058fcaa10#egg=cf_python
-e git+ssh://git@github.com/NCAS-CMS/cfdm.git@4106b448adf87ccef7c5285ac8624daf60f9956b#egg=cfdm
-e git+ssh://git@github.com/fsspec/filesystem_spec.git@262f664574e091228251b467ac92b2a6c327034b#egg=fsspec
-e git+ssh://git@github.com/cedadev/padocc.git@72e8e3538bd8ffe335c900a4f718e998a8ec9a7a#egg=pipeline
-e git+ssh://git@github.com/dwest77a/xarray.git@bef04067dd87f9f0c1a3ae7840299e0bbdd595a8#egg=xarray
Example CMIP6 data (JASMIN)
Attempted to aggregate the first two example files (successful)
Normal
cf.write
functions properly here by creating a combined netCDF file of both files, but using withcfa=True
results in one of the following, depending on if I take the whole of both Fields (116880 time steps):RuntimeError: NetCDF: HDF error
or a subselection of the last 10 from file 1 and the first 10 from file 2 .
g = cf.aggregate([ f[0][-10:], f[1][:10] ])
cf-python 3.16.2 (latest) cfdm 1.11.1.0 (latest)