NOAA-EMC / NCEPLIBS-bufr

The NCEPLIBS-bufr library contains routines and utilites for working with the WMO BUFR format.
Other
40 stars 19 forks source link

Opening GDAS BUFR files with the python API #577

Open annavaughan opened 3 months ago

annavaughan commented 3 months ago

Is it possible to open GDAS files using the python API?

We are trying to open GDAS AMSU-A files (e.g https://data.rda.ucar.edu/ds735.0/1bamua/2005/1bamua.20050109.tar.gz) and extract data

We have tried: git clone https://github.com/EXWEXs/py-ncepbufr.git cd py-ncepbufr python setup.py build python setup.py install And then running (from the notebook found at https://github.com/NOAA-EMC/NCEPLIBS-bufr/blob/develop/python/test/Python_tutorial_bufr.ipynb) import ncepbufr bufr = ncepbufr.open("gdas.1bamua.t12z.20210107.bufr") bufr.advance() bufr.load_subset() Which fails on bufr.load_subset() with exit code -1. It looks like py-ncepbufr is unmaintained, is there another way to do this within NCEPLIBS-bufr? Thanks!

rmclaren commented 3 months ago

The python wrapper you are using is now included in nceplib-bufr itself, you should probably use that version as the code you are using is very old.

If you are feeling very patient you could try this one: https://github.com/NOAA-EMC/bufr-query

annavaughan commented 3 months ago

Thank you very much for the quick response Ron!

I've tried both nceplib-bufr and the new bufr-query repo.

For nceplib-bufr I tried downloading the release from https://github.com/NOAA-EMC/NCEPLIBS-bufr/releases then mkdir build && cd build cmake -DCMAKE_INSTALL_PREFIX=path1 -DENABLE_PYTHON=ON make -j4 ctest make install Which completes, however when I try import ncepbufr in a python session I still get ModuleNotFoundError. Is there any documentation available for how to add this to the python path?

For bufr-query is there any documentation available for installation?

stratisMarkou commented 3 months ago

Hi @rmclaren, and thanks for the quick reply! Also copying in @jbathegit who discussed a similar issue (#530) and may know more about this. Just to add to @annavaughan's issue: it seems that (as discussed in issue #530, see this comment and following comments) when running ctest following the installation some of the tests fail with ModuleNotFoundError. Following the suggestions by @jbathegit in that issue, I've tried cloning the develop branch and following the installation instructions but am still having a few issues. Here are the steps I'm using and the issues there are as follows. First, clone the develop branch:

git clone -b develop git@github.com:NOAA-EMC/NCEPLIBS-bufr.git
cd NCEPLIBS-bufr

With my conda virtual environment enabled, I then run

cmake -DCMAKE_INSTALL_PREFIX=~/ -DENABLE_PYTHON=ON

At this point, one of the messages from cmake complains that it Could not download bufr test files, not building tests:

CMake Warning:
  No source or binary directory provided.  Both will be assumed to be the
  same as the current working directory, but note that this warning will
  become a fatal error in future CMake releases.

-- The C compiler identification is GNU 4.8.5
-- The Fortran compiler identification is GNU 4.8.5
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc - works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working Fortran compiler: /usr/bin/gfortran
-- Check for working Fortran compiler: /usr/bin/gfortran - works
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Checking whether /usr/bin/gfortran supports Fortran 90
-- Checking whether /usr/bin/gfortran supports Fortran 90 - yes
-- Finding test data files in directory ..
-- Setting build type to 'Release' as none was specified.
-- Found Python3: /home/em626/miniconda3/envs/npw/bin/python3.8 (found version "3.8.18") found components: Interpreter 
-- Check if the system is big endian
-- Searching 16 bit integer
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of unsigned short
-- Check size of unsigned short - done
-- Searching 16 bit integer - Using unsigned short
-- Check if the system is big endian - little endian
-- Downloading bufr test files...
-- Could not download bufr test files, not building tests
-- Configuring done
-- Generating done
-- Build files have been written to: /home/em626/rds/hpc-work/NCEPLIBS-bufr

In my local machine, the download works fine, and I can see the files under test/testfiles, but in our cluster (where I'm running the above commands) the download seems to fail without giving an explanation in the error. In any case, running

make -j4
make install

seems to work fine except make gives the warning WARN: Could not locate executable armflang, which I'm guessing might not be important. Lastly, I download and unzip a BUFR file from GDAS data:

wget https://data.rda.ucar.edu/ds735.0/1bamua/2005/1bamua.20050109.tar.gz
tar -xvf 1bamua.20050109.tar.gz

And then try to load it in python:

import ncepbufr  # import works fine
bufr = ncepbufr.open("20050109.1bamua/gdas.1bamua.t00z.20050109.bufr")  # also works fine
bufr.advance()  # returns 0
bufr.load_subset()  # returns -1

So load_subset does not seem to work as expected. I've also tried this on one of the test bufr files (which I donwloaded on my local machine and copied over) but that does not work either. @rmclaren @jbathegit, do you have any thoughts on what might be going wrong? We'd really appreciate your help!

rmclaren commented 3 months ago

@annavaughan The python path should be extended with the the site-packages directory that appears in the NCEPLIB-bufr install directory. So for example /lib/python3.12/site-packages (path will vary depending on the version of python you are using). You can either extend it on the command line (export PYTHONPATH = <your path>:$PYTHONPATH) or you can extend the sys path in your python script (import sys; sys.path.insert(0, <your path>)).

The bufr-query thing has a lot of dependencies which might be hard for you to build (JCSDA oops, ioda, etc...) which I'm working to remove (uuuugggg)... So might want to hold off for now.

rmclaren commented 3 months ago

@stratisMarkou I'm going to let the NCEPLIB-bufr maintainers answer your question.

jbathegit commented 3 months ago

Sorry, but for my part I'm afraid I don't have much to contribute. The python binding and associated modules is the part of the library I'm least knowledgeable about personally (I'm more of a Fortran and C person myself w.r.t. code development, in addition to being the overall library manager :-) But what I can do is bring a few other folks into this discussion (@AlexanderRichert-NOAA, @jswhit, @climbfuji, @edwardhartnett, @aerorahul) who might be able to help.

That said, please note that #530 was specifically related to the new Intel oneAPI compilers, and Alex in particular has been working further on that in #538, but it's still not resolved. Otherwise, the only way I can see that either of those might be related to this issue is in the aspect of being able to download the test tarfile, which was really just a side part of the discussion in #530. So if you're still having that problem(?), could you please specify which particular release version of the code you were trying to download from https://github.com/NOAA-EMC/NCEPLIBS-bufr/releases? If it was 11.5.0 or older, then that might well explain that issue, and I would respectfully suggest you to try downloading a newer release as noted in #530.