JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.44k stars 5.46k forks source link

Reading NetCDF files #994

Closed dronir closed 11 years ago

dronir commented 12 years ago

"NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data."

This would probably be useful to some users. I am personally working with some software that produces its output in NetCDF form and would like to use those files in some simulations I write in Julia. Currently I can just work around it by having a Python program that reads the files using the PyNetCDF module and saves the relevant arrays as CSV because my data files are not enormous.

There are C/Fortran libraries for it but implementing them in Julia is currently probably beyond my skills (I'm not entirely sure if it's even possible with our C interface).

On the other hand since NetCDF-4 the files are also valid HDF5 files so this parallels issue #805. The two older formats requires libnetcdf.

http://en.wikipedia.org/wiki/NetCDF http://www.unidata.ucar.edu/software/netcdf/ http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-c/

StefanKarpinski commented 12 years ago

If the libraries are Fortran, then it's guaranteed to be doable (gotta love Fortran). This would be nice to have (although, as you say, somewhat obviated by HDF5 support).

gmaze commented 11 years ago

If you want most of the geophysical fluid dynamic community to use Julia, netcdf is a MUST have ...

staticfloat commented 11 years ago

@gmaze: Do you use NetCDF files that are older than NetCDF-4? Because if not, NetCDF-4 is a subset of HDF5, and can likely be read via the HDF5 interface that @timholy and co have been cooking up.

Full-fledged libnetcdf support for the older formats is certainly possible, it just needs someone interested enough to spend their time to write a Julia wrapper for the C interface.

gmaze commented 11 years ago

Most of new netcdf files are netcdf-4 but there is still a lot of data out there using older versions. So most of the IT staff in labs will ask: "Can this new Julia language read all of our data ?" The answer must be yes otherwise, they'll keep buying Matlab licences for researchers. As any new lang. someone must take over it to write the library. But it seems feasible ...

timholy commented 11 years ago

We'd definitely welcome this as a contribution---being able to read & write standard data formats is a big win.

If you want to tackle this, it's possible that some of the code in hdf5.jl might serve as a useful example to you. I don't know the NetCDF library at all, but the HDF5 library consists of about 300 functions. Because I was not looking forward to doing error handling, etc., individually for each one of those, I developed a little bit of infrastructure that lets me wrap each new C function with a single line of code (including error handling). Basically, all you have to do is assemble a big table of instructions (what are the inputs, what return types, what error message you want to show on failure), and then Julia programmatically generates each wrapper function for you. For a library like HDF5 with a consistent pattern for reporting failure, this makes life much more pleasant.

meggart commented 11 years ago

I think I could contribute. I also got interested in Julia in for analysing GeoData, and started something like a netcdf package, which is functionally very similar to the R netcdf package. I concentrated on implementing classic netcdf-functions only, omitting all the netcdf4.stuff. So far it can only read netcdf files, but for that purpose it works well. I was planning to add write support, too

I am quite inexperienced in contributing code to bigger projects, so I don't know if I can take main repsonsibility for this, but I am definitely willing to share the code that I have and help with further development.

Tim, I will have a look at your C function handling wrapper. As you say, could make the netcdf code much shorter and easier to read.

StefanKarpinski commented 11 years ago

I am quite inexperienced in contributing code to bigger projects, so I don't know if I can take main repsonsibility for this, but I am definitely willing to share the code that I have and help with further development.

That's ok – there's only one way to learn :-)

timholy commented 11 years ago

@meggart, glad to have you aboard. It will be a very nice contribution.

I bet you've seen it already, but if not, the CONTRIBUTING.md file has some useful information in it.

meggart commented 11 years ago

Thanks for hinting at the contributing guidelines, yes they help a lot.

Ok, just a short update on netcdf/hdf. I tried the existing hdf5 package with several netcdf files, just to see if maybe netcdf support could be added as an extension of the existing hdf5 package, without having to start a new one. However, it looks like most of the climate and weather forecast models still produce data in the classic netcdf format, which can not be read with the hdf5 library. In version 4 (the newest) of the netcdf-library, when creating a file, by default still the classic netcdf format is used, which is probably, why I hardly found any hdf-based netcdf files. In addition, the convention on how to format metadata seems to be quite different between hdf5 and netcdf. So my conclusion is that an separate netcdf-package probably makes sense and I continue working on that path.

timholy commented 11 years ago

Thanks for the update. I look forward to seeing how it works out.

gmaze commented 11 years ago

I won't be able to contribute to this. But I could help in beta testing and in providing some standard netcdf files. If needed.

meggart commented 11 years ago

Yes, I am actually working again on the NetCDF-code, still busy debugging but I think some beta-testing could be useful quite soon. I think I will have something shareable soon (about one or two weeks to go...). Of course, you can already have a look at the current status of my code (https://github.com/meggart/julia_netcdf) and try it out but it will be easier when I have added some examples of how to use the package.

timholy commented 11 years ago

Cool!

One tip: you might want to rename that repository NetCDF.jl or something now, before everyone starts using it. I had to do that with my HDF5 package.

ViralBShah commented 11 years ago

I do not know much about the details, but is using something like GDAL (http://gdal.org) a good way to proceed? GDAL seems to support NetCDF (http://www.gdal.org/frmt_netcdf.html) and supporting GDAL would also provide support for a bunch of other stuff.

meggart commented 11 years ago

Hello, just a short update on the netcdf code. I think my code should be stable enough so that whoever wants to test it with his own netcdf files can do so. So gmaze and everyone else, if you could try to read your files that would be great.

pao commented 11 years ago

Pinging @gmaze on behalf of @meggart (pings only happen if you use the @-notation).

ViralBShah commented 11 years ago

https://github.com/meggart/netcdf.jl