georust / netcdf

High-level netCDF bindings for Rust
Apache License 2.0
81 stars 28 forks source link

Cannot open era5-lan nc file. #124

Open lll9p opened 7 months ago

lll9p commented 7 months ago

Error code is Error: Netcdf(2). I tried open with options but no luck, and I can read it from python.

Below is the nc file(zipped to upload to github) I want to read. 2009.01-04.nc.zip

magnusuMET commented 7 months ago

I am afraid I can't replicate this, I can read this using the example ncdump in this repo. Can you read other netCDF files using the crate? Could you share any details on what os/hdf5/netcdf you have installed?

lll9p commented 7 months ago

I am afraid I can't replicate this, I can read this using the example ncdump in this repo. Can you read other netCDF files using the crate? Could you share any details on what os/hdf5/netcdf you have installed?

Thanks, I use the static feature (netcdf = { version = "0.8", features = ["static"] }).

And netcdf cannot read those nc file, which works on other nc files not from era5.

I tested on Windows (with msys2) and it fails, but at my linux machine with the same Cargo.toml, it can read it.

I'll test more.

lll9p commented 7 months ago

@magnusuMET

it is the path string encoding problem, I guess.

I test under rust 1.74.1 and with msys2 installed. update: test under rust 1.74.1 with MSVC, still no luck.

stable-x86_64-pc-windows-gnu (default)
rustc 1.74.1 (a28077b28 2023-12-04)

main.rs

use netcdf;
fn main() {
    let file = "℃.nc";
    let _ = netcdf::open(file);
}

Then I modified source of netcdf (file.rs) to print output of get_ffi_from_path.

In Windows:

println!("{:?}", f.as_bytes_with_nul());

In Linux:

println!("{:?}", f);

When file = "℃.nc", output of windows(Failed to read) is [226, 132, 131, 46, 110, 99], output of linux(Read successfully) is [226, 132, 131, 46, 110, 99, 0]

When file = "1.nc", output of windows(Read successfully) is [49, 46, 110, 99], output of linux(Read successfully) is [49, 46, 110, 99, 0]

I searched in Unidata's repo and found: Error opening netcdf in path with "special" characters. But I still dont know how to fix it.

magnusuMET commented 7 months ago

Some special handling of "weird" filenames is included, but for linux only since that is what I have access to. I see python has some encoding handling in [0], maybe we need to add something similar?

[0] https://github.com/Unidata/netcdf4-python/blob/7ead17a38f0c7ae690c90389d46ef2784064e915/src/netCDF4/_netCDF4.pyx#L2369

lll9p commented 7 months ago

I dont think so. Follow netcdf4-python's code, it just encode string to utf-8, is the same result from get_ffi_from_path.

for i in "℃.nc".encode("utf-8"):
    print(i,",",end="")
#226 ,132 ,131 ,46 ,110 ,99 ,

btw, if I can read nc file to bytes and open in memory, then no need to deal with the path encoding problem. But I test with static feature and found that open_mem does not work with static feature.

lnicola commented 7 months ago

But what does sys.getfilesystemencoding() return on Windows? ascii? utf-16?

lll9p commented 7 months ago

But what does sys.getfilesystemencoding() return on Windows? ascii? utf-16?

output of sys.getfilesystemencoding() on Windows is 'utf-8'.