NOAA-EMC / bufr-query

Apache License 2.0
0 stars 0 forks source link

DataContainer Functions #10

Open emilyhcliu opened 1 week ago

emilyhcliu commented 1 week ago

@rmclaren I need your suggestion for the following data conversion case:

For IASI data, there are three BUFR sources, each with its mapping file, but they all map to the same IODA variable fields.

container1  mtiasi bufr - dimensions: [Location1, Channel1] --- Channel1 = 616 (This is the main data set)
container2  esiasi bufr - dimensions: [Location2, Channel2] --- Channel2 = 500
container3  iasidb bufr - dimensions: [Location3, Channel3] --- Channel3 = 500

Three containers need to be combined into one container (container = container1 + container2 + container3) The target container dimension for 2D variables is [Location1+Location2+Locaion3, 616)

So, the variables in container2 & container3, which have channels in the dimension, will need to be reorganized (from 500 to 616) The variables that need to be modified are: variables/sensorChannelNumner and variables/spectralRadiance

For container2, I have modified these two variables so that their dimensions changed from [Location2, 500] to [Location2, 616] For container3, I have modified these two variables so that their dimensions changed from [Location3, 500 to [Location3, 616]

I tried DataContainer.replace:

       container2.replace('variables/sensorChannelNumber', channum_new, [category])
       container2.replace('variables/radiance', radiance_new, [category])

I got the following error message:

Exception:  Reason: Python error: RuntimeError: Bad parameter: ERROR: Dimension mismatch.

At:
  bufr2ioda_iasi.py(78): create_obs_group

    source_column:  0
    source_filename:    /scratch1/NCEPDEV/da/Emily.Liu/JEDI-ioda/ioda-bundle/ioda/src/engines/ioda/src/ioda/Engines/Script/Script.cpp
    source_function:    ioda::ObsGroup ioda::Engines::Script::openFile(const ioda::Engines::Script::Script_Parameters &, ioda::Group)
    source_line:    278

The error message was expected since I was trying to add data with dimension [Location2, 616] to a data path with dimension [Location2, 500].

@rmclaren Do you have any suggestions?

rmclaren commented 6 days ago

@emilyhcliu This is a thornier issue than I thought originally. I'm trying to think of a good way to merge data like this in order to keep things consistent. We can talk about the details later.

emilyhcliu commented 4 days ago

@rmclaren

One question about all_sub_categories()

yaml_path = './bufr2ioda_mtiasi_mapping.yaml'
input_path= './gdas.t00z.esiasi.tm00.bufr_d'
container = bufr.Parser(input_path, yaml_path).parse()
categories = container.all_sub_categories()
print(categories)

There are three categories for IASI: metop-a, metop-b and ,metop-c

I expect that the following output from print(categories)

[ 'metop-a', 'metop-b', 'metop-c']

But, I got the following:

[['metop-a'], ['metop-b'], ['metop-c']]

Why do we get lists inside of a list?

rmclaren commented 3 days ago

This is because you can categorize (split) on several parameters. So for example you could define two splits, satellite ID and hour. In which case you would get a list [['metop-a', '2'], ['metop-a', '3'], ['metop-a', '4'], ['metop-b', '2'], ['metop-b', '3'], ['metop-b', '4'], etc...]. So 'metop-a' and '2' are subcatgories. Sets of subcategories ex:['metop-a', '2'] make a category.

emilyhcliu commented 1 day ago

@rmclaren Can we add functionality so that users can request a sub-container from a container with categories? For example, my mapping has categories (e.g. goes-16, goes-17, goes-18) container,all_sub_categories = [['goes-16'], ['goes-17'], ['goes-18']]

Can we have some method to break down the container like the following: container1 = container('goes-16') container2 = container('goes-17') container3 = container('goes-18')

emilyhcliu commented 13 hours ago

Good news. I tested creating multiple obs spaces from one bufr file for satwind using script backend and data cache. It worked great!!

For the satwind case, the BUFR contains g16 and g17. But, my mapping file defines three categories (g16, g17, g18). So, the output g18 should be empty with headers only. This also worked!!

So, I am closing issue #6 - about creating empty data file with header only for categories in BUFR but defined in the mapping file. Now, I realize that it is a good thing to create an empty data file with headers only under the circumstances described in the issue.