Closed tedhabermann closed 2 years ago
It would be reasonable to warn and continue. Whether to warn or raise could be optional. I would not put special values into the reference output, though, since there's nothing for that in the spec, and it's not clear to me what ReferenceFileSystem should do when loading these. I guess you might end up replacing all values with None?
If nothing is added to the output (which is fine), the reference map becomes a map of the data from the file that can be read with Zarr. As there is nothing that says that will be all the data in the file, that is not necessarily a bad thing. it is what it is!
I sort of like writing it into the json file so it is a permanent reminder of what is being missed...
Perhaps some kind of a value in the JSON that the translation failed for that HDF5 dataset. And even a message describing why it failed (for the record).
I know what you mean, but we have a strict spec for the reference file that we're trying to adhere to (see this repo's readme). Our only use case so far is for loading data with zarr, but this may well not true forever. What I could imagine, is having a specific type of exception that gets raised in ReferenceFileSystem. Any library that doesn't expect it will just raise.
I am excited about file maker and am testing it with two large repositories of seismic waveforms and GPS data (IRIS and UNAVC)... We have a wide variety of HDF5 files we are moving to the cloud... My team is getting a little depressed because every time they try to run the code it crashes... Warnings, on the other hand, help us understand the limitations!
Yes, I agree with warnings, so please feel free to edit #38 . Omit the relevant variable for now. It will be good to report back here what the most common problems are so we can deal with them.
Option to warn or ignore now in
In https://github.com/intake/fsspec-reference-maker/pull/38 I added error messages that were more specific so that we could more easily identify problems with various flavors of HDF5.
What I would really like is making these RuntimeErrors into warnings so we could finish processing files instead of exiting. IMHO, the same thing should happen with HDF5 data types that can not be processed, i.e. variable-length strings...
One thing that might work is writing out a -1 for a size of the dataset so that someone could take a look at the json to find errors. Of course, there could be better approaches...