fsspec / kerchunk

Cloud-friendly access to archival data
https://fsspec.github.io/kerchunk/
MIT License
312 stars 81 forks source link

Warnings rather than exits #39

Closed tedhabermann closed 2 years ago

tedhabermann commented 3 years ago

In https://github.com/intake/fsspec-reference-maker/pull/38 I added error messages that were more specific so that we could more easily identify problems with various flavors of HDF5.

What I would really like is making these RuntimeErrors into warnings so we could finish processing files instead of exiting. IMHO, the same thing should happen with HDF5 data types that can not be processed, i.e. variable-length strings...

One thing that might work is writing out a -1 for a size of the dataset so that someone could take a look at the json to find errors. Of course, there could be better approaches...

martindurant commented 3 years ago

It would be reasonable to warn and continue. Whether to warn or raise could be optional. I would not put special values into the reference output, though, since there's nothing for that in the spec, and it's not clear to me what ReferenceFileSystem should do when loading these. I guess you might end up replacing all values with None?

tedhabermann commented 3 years ago

If nothing is added to the output (which is fine), the reference map becomes a map of the data from the file that can be read with Zarr. As there is nothing that says that will be all the data in the file, that is not necessarily a bad thing. it is what it is!

I sort of like writing it into the json file so it is a permanent reminder of what is being missed...

ajelenak commented 3 years ago

Perhaps some kind of a value in the JSON that the translation failed for that HDF5 dataset. And even a message describing why it failed (for the record).

martindurant commented 3 years ago

I know what you mean, but we have a strict spec for the reference file that we're trying to adhere to (see this repo's readme). Our only use case so far is for loading data with zarr, but this may well not true forever. What I could imagine, is having a specific type of exception that gets raised in ReferenceFileSystem. Any library that doesn't expect it will just raise.

tedhabermann commented 3 years ago

I am excited about file maker and am testing it with two large repositories of seismic waveforms and GPS data (IRIS and UNAVC)... We have a wide variety of HDF5 files we are moving to the cloud... My team is getting a little depressed because every time they try to run the code it crashes... Warnings, on the other hand, help us understand the limitations!

martindurant commented 3 years ago

Yes, I agree with warnings, so please feel free to edit #38 . Omit the relevant variable for now. It will be good to report back here what the most common problems are so we can deal with them.

martindurant commented 2 years ago

Option to warn or ignore now in