ReallyNiceGuy / eccodes_local_tables

Local ODIM and MeteoFrance BUFR tables formatted for ECCodes library
MIT License
1 stars 1 forks source link

Missing documentation to make it work #1

Open imhaage opened 3 months ago

imhaage commented 3 months ago

Hi,

I'm just discovering the complex BUFR file format supplied by meteofrance as public radar data, and I'm far away from my comfort zone (web development & GIS "standard" formats). For a few days I've been trying to find tools that will allow me to convert data to more friendly data formats.

Sadly, the local tables supplied by meteofrance only work with the OPERA tools, and so far it hasn't been a very good experience. I'm trying to work with tools based on eccodes, in particular some Python librairies that look interesting (pybufr, eccodes-python). So I tried to find a way to convert CSV files to eccodes definitions, but I discovered that it was a big problem for a lot of people and projects around the world. Then I found your repository, and it was the best basis I can find for completing this task. Thanks for sharing your work.

I was able to understand how a few scripts work (convert_tableb_to_element.py, convert_tabled_to_sequence.py), but I can't go any further with my humble knowledge at the moment. I'm missing some documentation.

I had to add some error handling for an error caused by wrong data in one CSV file. 😔 Instead of valid data the meteofrance CSV contained '[GC threshold exceeded with 12,014,472 bytes in use. Commencing GC.]', '[GC completed with 277,360 bytes retained and 11,737,112 bytes freed.]' and '[GC will next occur when at least 12,277,360 bytes are in use.]'.

With some guidance, I'll be glad to work on this interesting project and add some documentation too. From what I've read in the last few days, this repo should be better known because it should be of interest to a lot of people. I was surprised to see so many people struggling with these BUFR files from meteofrance.

ReallyNiceGuy commented 3 months ago

Hi,

I feel your pain. This repository was setup just to keep my code somewhere safe. It is not pretty nor documented because it was basically a one time thing. After I created the tables, I manually edited the items I actually use to have nice names, as the default is to create a name like entry004194. I didn't upload the tables themselves because the copyright of the original tables are not clear and the work has to be done to rename every single item without clash. I am happy to accept your help in documenting the code, and if we can legally post the resulting table, I am happy to do it also.

imhaage commented 3 months ago

Thanks for your quick reply and for your positive response.

I had deduced that it was a repo for personal use, and the code didn't seem so ugly. 😉 It was quite clear how the code worked, my principal problem is to deduce the arguments and the paths to use. As I'm just starting out with the BUFR format, it's more difficult to solve questions of this kind. With your help we should be able to explain everything's needed to get the eccodes definitions. 🤞

I'll try to find more information about the licences, I haven't looked into it.

I have no precise idea of the best way to move forward with the project, but I can propose that I list what I've deduced and the points where I'm at a loss. Or I let you list the steps needed to convert all CSV files, then I'll create the README content and add comments in the code. I'll let you choose what suits you best.

ReallyNiceGuy commented 3 months ago

I don't remember the steps anymore. The list should be best, so I can see exactly what you are doing and where you hit a snag.

imhaage commented 3 months ago

Okay, I'll do my best to get you back on track. 😉

Folder structure and filenames

First, based on the definitions from the repo, I assumed the CSV files had been updated since. It increased the difficulty to deduce the result in the repo folder definitions/ from different source files. Here's a link to the current tables supplied by meteofrance : tables_bufr_361.

bufrtabb_11.csv
bufrtabb_13.csv
bufrtabb_16.csv
bufrtabd_11.csv
bufrtabd_13.csv
bufrtabd_16.csv
localtabb_247_8.csv
localtabb_85_12.csv
localtabb_85_14.csv
localtabb_85_20.csv
localtabd_247_8.csv
localtabd_85_12.csv
localtabd_85_14.csv
localtabd_85_20.csv

From the eccodes documentation about local configuration, I was able to understand vaguely the folder structure:

definitions/bufr/tables/[masterTableNumber]/local/[localTablesVersionNumber]/[bufrHeaderCentre]/[bufrHeaderSubCentre]

Hypotheses

(may be wrong)

First questions

(more to come I suppose)

ReallyNiceGuy commented 3 months ago
* what is wrong in my hypotheses ?

You got all correct.

* can you specify the `sys.argv[]` arguments in the different scripts ? I couldn't figure out how to define these variables correctly : `refcsv`, `basecsv`, `basepath`, `outpath`, etc.

These are parameters for the script itself, i.e: copy_codetables.py reference basepath/ outpath/ Maybe I didn't understand your question.

* how to generate the `codetables/` files ? I tried to execute `convert_codetable_to_eccodes.py` with different CSV as source file, then I tried a generated `element.table`, trying to find a match with the `get_fields()`function. But nothing worked.

* the files in the repo definitions `codetables/` subfolders contain content I can't find in the current CSV files. Perhaps the source files used to generate these tables contained this data, but I doubt it. For example, the file `definitions/bufr/tables/0/local/11/85/0/codetables/1192.table` contains `Composite complete`, `Composite francaise`, `Composite Royaume-Uni Irlande`, `Composite suisse`, etc. Where does this data come from?

The codetables are made using other files. They are btc085.010, btc085.011 and btc085.012. I don't have a link to them. I need to clarify the copyright before I upload them.

The copy_codetables.py just copies the necessary elements based on a element.table and a reference codetable directory. This is to fill out any missing codetable item from a known codetable. Ex: copy_codetables.py ./0/local/14/85/0/element.table ./0/local/12/85/0/codetables ././0/local/14/85/0/codetables

imhaage commented 3 months ago

You got all correct.

That's good news, thank you. When I work on documentation I'll try to explain all that clearly.

The codetables are made using other files. They are btc085.010, btc085.011 and btc085.012. I don't have a link to them. I need to clarify the copyright before I upload them.

Everything's clear now, some files used by the scripts are not in the repo ! I hadn't considered this possibility when I was trying to understand how it works. After reading your reply I looked everywhere on the web for them, but they were impossible to find. They do not seem to be made available by meteofrance. If these files are not shared by meteofrance, it seems difficult to go further because I suppose these files are updated on a regular basis, like the tables of descriptors ?

Maybe I didn't understand your question.

Sorry, that wasn't clear. I couldn't figure what file the reference was, or what basepath I should use (the folder containing the OPERA CSV ? or one of the generated definitions folder ? or one folder from eccodes default definitions ?) Thanks to your answer I understood that some files were missing, so it was logical that I couldn't figure what the reference file was.

I started looking about licences for the tables of descriptors. On the public data page where public radar API are listed and where tables of descriptors can be downloaded, there is a mention about public data licence : _Royalty-free under Etalab open license. The source to indicate is "Météo-France". Some suggestions: "Source: Météo-France" or "Information created from Météo-France data"._ In this licence we can read:

You are free to reuse the "Information":

  • Reproduce, copy, publish and transmit "the Information";
  • Broadcast and redistribute "the Information";
  • Adapt, modify, extract and transform the "Information", in particular to create "Derived Information";
  • Exploit the "Information" commercially, for example by combining it with other with other "Information", or by including it in your own product or application.

You need to :

  • Mention the authorship of the "Information": its source (at least the name of the "Producer") and the date of its last update. The "Re-user" may fulfill this condition in particular by indicating one or more hypertext link(s) (URL) to the "Information", ensuring effective acknowledgement of its authorship. This mention of authorship must neither confer an official character on the re-use of the "Information", nor suggest any recognition or endorsement by the "Producer", or by any other public entity, of the "Re-user" or of its re-use.
ReallyNiceGuy commented 3 months ago

Here is a link to the missing files. I am sure that they will fall into the same copyright, as they are integral part of the definitions above.

https://drive.google.com/file/d/1eWJtv_8-AgoPkDmVgBiwuIxJWFw2FXst/view?usp=sharing

imhaage commented 3 months ago

Thank you, I'll see if I can make it work now. Do you remember how you found those unfindable files ? It would have been great to be able to put a link in the documentation. 🤔

I just realized that the data from these reference tables must be present somewhere in the OPERA software suite to allow correct decoding of meteofrance data, if the tables supplied by meteofrance don't contain this data. 🤔 I'll make some research.

ReallyNiceGuy commented 3 months ago

Thank you, I'll see if I can make it work now. Do you remember how you found those unfindable files ? It would have been great to be able to put a link in the documentation. 🤔

I got from MetroFrance directly per email.

I just realized that the data from these reference tables must be present somewhere in the OPERA software suite to allow correct decoding of meteofrance data, if the tables supplied by meteofrance don't contain this data. 🤔 I'll make some research.

They must exist, it would make no sense not to have them available somewhere, as they are integral to encoding and decoding the BUFR files.

imhaage commented 3 months ago

Hi,

In the last few days I had to work on other tasks, so I'm going back to work now. I've still taken the time to read some of the OPERA and eccodes documentation, and I'm beginning to understand some of the BUFR elements, but I still have a lot to learn. A clever but complex format... I also sent an email to meteofrance to ask if they have a solution to read their BUFR format with eccodes, maybe things have changed since the data was made publicly available (january 2024).

Can you confirm I'm heading in the right direction ? This is what I plan to do :

  1. Generate element.table + sequence/def files from *.csv
    • bufrtabb_11.csv -> tables/0/local/11/85/0/element.table
    • bufrtabb_13.csv -> tables/0/local/13/85/0/element.table
    • bufrtabb_16.csv -> tables/0/local/16/85/0/element.table
    • bufrtabd_11.csv -> tables/0/local/11/85/0/sequence.def
    • bufrtabd_13.csv -> tables/0/local/13/85/0/sequence.def
    • bufrtabd_16.csv -> tables/0/local/16/85/0/sequence.def
    • localtabb_247_8.csv -> tables/0/local/8/247/0/element.table
    • localtabb_85_12.csv -> tables/0/local/12/85/0/element.table
    • localtabb_85_14.csv -> tables/0/local/14/85/0/element.table
    • localtabb_85_20.csv -> tables/0/local/20/85/0/element.table
    • localtabd_247_8.csv -> tables/0/local/8/247/0/sequence.def
    • localtabd_85_12.csv -> tables/0/local/12/85/0/sequence.def
    • localtabd_85_14.csv -> tables/0/local/14/85/0/sequence.def
    • localtabd_85_20.csv -> tables/0/local/20/85/0/sequence.def
  2. Generate code tables then copying the code tables in the matching folder based on the element.table in this folder.
  3. Update entry name

I'll try to generate all folders and files automatically in one shot.


I encounter one warning during codetables generation, I suppose you encountered it too ?

python convert_codetable_to_eccodes.py .local/btc085.012
8202.table: Mismatched entries
{'filename': '8202.table', 'count': 14, 'items': {'0': 'Brute', '1': 'Adaptation statistique', '2': 'Prevision reactualisee', '3': 'Prevision validee par un previ', '4': 'Interpolation spatiale filtree', '5': 'Adaptation statistique filtree', '9': 'Prevision "AS BEST"', '16': 'Safran analogue', '17': 'Safran/Aladin', '20': 'Sympo horaire', '21': 'Sympo quotidienne', '31': 'VALEUR MANQUANTE'}}

I don't have a good understanding of the numbers in the list, but thanks to your logs I can see that 14 entries are needed, 14 entries are present in the file for the table 8202, but 3 entries start with a 16 (id?). So the 16 key in the dictionary is overwritten and only the last value is present. I assume this is an update rather than an error, do you have any information about that ?

# file btc085.012, line 167

 008202  14  0 [C5 ] Type de prevision
          0  1 Brute
          1  1 Adaptation statistique
          2  1 Prevision reactualisee
          3  1 Prevision validee par un previ
          4  1 Interpolation spatiale filtree
          5  1 Adaptation statistique filtree
          9  1 Prevision "AS BEST"
         16  1 Safran/Arpege
         17  1 Safran/Aladin
         16  1 Safran
         16  1 Safran analogue
         20  1 Sympo horaire
         21  1 Sympo quotidienne
         31  1 VALEUR MANQUANTE 

One last question : I looked into the btc085.01x files. The files seemed to contain the same data. I was able to confirm that with the diff command. I suppose the files were generated from a database at different times, but the data didn't change between each release.

imhaage commented 3 months ago

I'm understanding that bufrtab*.csv supplied by meteofrance are in fact "official" tables from WMO, and the definitions are already present in the ecCodes definitions. These tables must correspond to the masterTablesVersionNumber in the BUFR headers, and only meteofrance local tables (85) correspond to the localTablesVersionNumber.

I'll try to generate ecCodes defintions for these bufrtab*.csv files with your script then I'll do a diff with the files already present in the ecCodes definitions folders : definitions/bufr/tables/0/wmo.

If this is confirmed, there will be 6 fewer files to generate and my understanding of the subject will have increased a little more. 🤞

ReallyNiceGuy commented 3 months ago

I am traveling and it is a bit hard to work on this right now, but you are correct on all your assumptions. The warning is due to the duplicate value. If the table is from WMO, either it was already broken or MF modified it. I used what MF sent me to assemble the required files, but if you found the originals, the better! I am not 100% sure that we need to duplicate these tables. If eccodes does lookup in the standard tables to decode the meaning of the item, then no duplication is better. I used what they sent me because I assumed that they extended these tables, and on that case, local definition takes precedence over the global ones.

On Tue, Jun 25, 2024, 22:15 Mathieu HAAGE @.***> wrote:

I'm understanding that bufrtab*.csv supplied by meteofrance are in fact "official" tables from WMO, and the definitions are already present in the ecCodes definitions. These tables must correspond to the masterTablesVersionNumber in the BUFR headers, and meteofrance local tables are correspond to the localTablesVersionNumber.

I'll try to generate ecCodes defintions for these files with your script then I'll do a diff with the files already present in the ecCodes definitions folders : definitions/bufr/tables/0/wmo.

If this is confirmed, there will be 6 fewer files to generate and my understanding of the subject will have increased a little more. 🤞

— Reply to this email directly, view it on GitHub https://github.com/ReallyNiceGuy/eccodes_local_tables/issues/1#issuecomment-2188927770, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABR6CQTJEAXEKXNQKSELRFTZJFUOHAVCNFSM6AAAAABJNX5IESVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBYHEZDONZXGA . You are receiving this because you commented.Message ID: @.***>

imhaage commented 3 months ago

Once again thank you for your help, your validation is all I needed. I don't expect you to be working on it at the moment. 😉 Have a nice trip, I'll keep digging.

ReallyNiceGuy commented 2 months ago

Thank you. I am available to help again.

imhaage commented 2 months ago

Hi, I'm sorry, after a rather dynamic start I can't find time to work on the project right now. I am finishing renovating my house so I can move in soon. I hope to pick up where I left off very soon.