Closed DEHewitt closed 1 year ago
Hi @DEHewitt, thanks for reporting this. This looks like an issue with the filesystem on your Katana HPC. Not something we can fix on our end.
One workaround solution might be to use the feature that @willirath implemented in #1303 (and which is now part of v2.4.1); instead of writing directly to a file, you can also write to a zarr.storage.Store object
, which you can then write to a file later, after the execute has finished.
Let us know if this works!
Hi @DEHewitt, I noticed that in
try:
os.remove(output_nc_dist)
except OSError:
pass
you try to remove the Zarr store. But as unlike netCDF-files, a Zarr store is a whole directory, os.remove(<store>)
won't work and the store won't be removed. This means that if you have a previous unsuccessful experiment writing to the same store, there will be potentially inconsistent data already in the strore.
I'm not sure this is the root of the problem you see, but it might be worth checking.
Another test which might help pinning down the problem without the Parcels framework on top of it would be to just write a Zarr store in a Python process along the lines of (untested code):
import zarr
from pathlib import Path
localPath = Path("/srv/scratch/z5278054/particle-tracking-sandra/Output/")
test_store = localPath / "test_001.zarr/"
z = zarr.zeros((10000, 10000), chunks=(1000, 1000), dtype='i4')
zarr.save(test_store, z)
This will create a Zarr store which is a little simpler than those created by Parcels. So let's go for a more complicated multi-variable structure as well.
import xarray as xr
from dask import array as darr
dataset = xr.Dataset(
{
"x": xr.DataArray(darr.random.uniform(size=(1_000, 1_000), chunks=(100, 100)), dims=("i", "j)),
"y": xr.DataArray(darr.random.uniform(size=(1_000, 1_000), chunks=(100, 100)), dims=("i", "j)),
},
)
localPath = Path("/srv/scratch/z5278054/particle-tracking-sandra/Output/")
test_store = localPath / "test_002.zarr/"
dataset.to_zarr(test_store);
Thanks so much for your help @erikvansebille and @willirath! The solution in #1303 seems to have done the trick. Also, both tests posted by @willirath worked. Thanks again :)
Hi,
cc: @sandra-neubert
I am running some simulations on a HPC (Katana at UNSW) and getting multiple, but I think related, errors when Parcels tries to write the output as a .zarr file. I have contacted IT support at our university, but there was a recent upgrade to the HPC, so they are a little slow replying to tickets at the moment.
I haven't run any simulations since the latest release where .zarr was implemented. I have copied the errors and script below, but will provide a brief summary of my approach: we are aiming to release 1 particle per 1x1 degree square in a near-global model (OFES). This is over a period of ~70 years, so to keep computation times low, they're being run as an array of jobs where we set the
runtime
differently for each job by:all possible combinations of the two vectors
years = np.repeat(years, len(months)) months = np.repeat(months, 70)
Full script
Error 1 When I navigate to the directory
'/srv/scratch/z5278054/particle-tracking-sandra/Output/1950-1NearGlobalParticleTrackingOFES.zarr/time/.zarray'
I can see that the file does exist.Error 2
Error 3
Error 4
Error 5
I am a little confused that with some of these errors the progress bar is printed after the error message too, does this imply that the job continued running?
Any help you can offer would be greatly appreciated!
Kind regards,