Closed YoulongXia-NOAA closed 9 months ago
On Hera, I am working new snow data to create yaml file, I put a input data in /scratch1/NCEPDEV/da/Youlong.Xia/test/test_fromRon/testinput - gdas.t06z.snocvr.tm00.bufr_d, I also use subset.x to reduce data size to small file - sizeReduced_gdas.t06z.snocvr.tm00.bufr_d. I create my yaml file to produce nc file in /scratch1/NCEPDEV/da/Youlong.Xia/test/test_fromRon/testrun - gdas.t06z.snocvr.tm00.nc and sizeReduced_gdas.t06z.snocvr.tm00.nc. I used ncdump to find that stationIdentification is a weird character in some places. I used debufr to check and found the message below. It shows all IDs are missing including WGOSLID Ilana suggested I use. I think that bufr2ioda.x cannot handle MISSING characters or other reasons. When I used ncdump to look and it shows me like this: stationIdentification = "570282", "▒▒▒▒▒▒▒▒", "▒▒▒▒▒▒▒▒", "▒▒▒▒▒▒▒▒", "▒▒▒▒▒▒▒▒", "▒▒▒▒▒▒▒▒", "613180", "273749", "06631", "▒▒▒▒▒▒▒▒",
The 10th record is below:
BUFR message # 10 of type NC000015 and date 2023050103 contains 1 subsets:
MESSAGE TYPE NC000015
001125 WGOSIDS MISSING NUMERIC WIGOS IDENTIFIER SERIES 001126 WGOSISID MISSING NUMERIC WIGOS ISSUER OF IDENTIFIER 001127 WGOSISNM MISSING NUMERIC WIGOS ISSUE NUMBER 001128 WGOSLID MISSING (16)CCITT IA5 WIGOS LOCAL IDENTIFIER (CHARACTER) 001101 STID 424.0 CODE TABLE State identifier 424 = United States of America (RA IV) 001102 NSID MISSING NUMERIC National station number 001019 LSTN BBSA3 ( 32)CCITT IA5 Long station or site
@barlage suggested to use LSTN after he look at the MADIS web viewer: https://madis-data.ncep.noaa.gov/MadisSurface/.
I tested hisr LSTN using bufr2ioda.x developed by Ron, the results show below:
stationIdentification = "Leukerba", "RNHW4", "MCKN5", "BBSW4", "LIZC2", "TPEM8", "Bosco Gu", "Ofenpass", "BERN / Z", "BBSA3", "COSW4", "MANM8", "TWNW4", "Malbun /", "Nendaz /", "PLAFFEIE", "Julier /", "SURU1", "DIAO3", "MULM8", "YOUW4", "Dotra /", "Malbun /", "Chaussy", "Grammont", "PRVT2", "KNXT2", "GLSI1", "RESC2", "HDLI1", "Puzzetta", "MWDO3", "ISAP4", "JVCI1", "BLTU1", "LMDO3", "TRGW1", "Arolla /", "Vals / P", "Stockhor", "Hinterrh", "Unterwas", "Planacha", "Flumserb", "Saanenmo", "DAVOS", "Gantrisc", "Ftan", "Plaun La", "Goschene", "Binn", "Trubsee", "Arolla", "Lauchere", "Oberwald", "TKVA2", "VNCM6", "GANA2", "MANH1", "Stockhor", "MHWO3", "RAWC2", "BKSI1", "LSMU1", "SUCW4", "Bedretto", "Nendaz /", "Ybrig /", "SGMC2", "CCKU1", "LWDI1", "TCMO3", "Arbaz, V", "Mesocco", "GRIMSEL", "La Fouly", "MTBU1", "CULC2", "MOQI1", "WCTA3", "Campolun", "Jaun / C", "Wassen /", "Airolo /", "PTNK2", "CTSM7", "FISW1", "PRSN2", "JNPC2", "Ofenpass", "SKZC2", "PERM6", "WSMU1", "BRBI1", "LMCN2", "TIPO3", "Jaun / C", "Simplon", "Davos /", "Fluela /", "Corvatsc", "Planacha", "Murren", "Saanenmo", "Bourg-St", "Gantrisc", "Ftan", "Plaun La", "Goschene", "Innergla", "Trubsee", "Ulrichen", "Elm / Mi", "Nara / M", "KTSA2", "FTRO2", "BSBM7", "WPLH1" ;
Some stations contain "/". The 9th bufr message I debufr it is below: 001019 LSTN BERN / ZOLLIKOFEN ( 32)CCITT IA5. This means that there is a station name as BERN or ZOLLIKOFEN, so that it contains "/". Bufr2ioda.x selects it as "BERN / Z". If @ilianagenkova agrees that I use LSTN, @rmclaren needs to modify bufr2ioda.x to only select the first one if "/" appears. All, if you have any further comments/suggestions, please let me know.
@YoulongXia-NOAA I would use the python API instead of the YAML file. This way you'd have total freedom to handle these strings in any way you want. Otherwise we'd be modifying core BUFR reading code every time a specific file has anything funny in it...
@BenjaminRuston @ilianagenkova, do you think I can use LSTN as stationIdentification? If YES, @rmclaren is suggesting me to use Python API rather than yaml file, how do you think if yaml file and bufr2ioda.x cannot handle "/", @BenjaminRuston?
I'm fine with using the python API to handle these unique station names for this dataset
@rmclaren @BenjaminRuston, Okay, I will work on python API for this dataset using LSTN as the stationIdentification. Thank you.
Here's an example:
import bufr
import ioda_obs_space as ioda_ospace
DATA_PATH = './testinput/gdas.t00z.1bhrs4.tm00.bufr_d'
OUTPUT_PATH = './testrun/bufr_query_python_to_ioda_test.nc'
def test_bufr_to_ioda():
# Make the QuerySet for all the data we want
q = bufr.QuerySet()
q.add('year', '*/YEAR')
q.add('month', '*/MNTH')
q.add('day', '*/DAYS')
q.add('hour', '*/HOUR')
q.add('minute', '*/MINU')
q.add('second', '*/SECO')
q.add('latitude', '*/CLON')
q.add('longitude', '*/CLAT')
q.add('radiance', '*/BRIT/TMBR')
# Open the BUFR file and execute the QuerySet
with bufr.File(DATA_PATH) as f:
r = f.execute(q)
# Use the ResultSet returned to get numpy arrays of the data
lat = r.get('latitude')
lon = r.get('longitude')
rad = r.get('radiance')
datetime = r.get_datetime('year', 'month', 'day', 'hour', 'minute', 'second')
# Create the dimensions
dims = {'Location': rad.shape[0],
'Channel': rad.shape[1]}
# Create the IODA ObsSpace
obsspace = ioda_ospace.ObsSpace(OUTPUT_PATH, mode='w', dim_dict=dims)
# Create the global attributes
obsspace.write_attr('MyGlobal_str', 'My Global String Data')
obsspace.write_attr('MyGlobal_int', [1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# Create the variables
obsspace.create_var('MetaData/Timestamp', dtype=datetime.dtype, fillval=datetime.fill_value) \
.write_attr('units', 'seconds_since_epoch') \
.write_attr('long_name', 'Timestamp') \
.write_data(datetime)
obsspace.create_var('MetaData/Latitude', dtype=lat.dtype, fillval=lat.fill_value) \
.write_attr('units', 'degrees_north') \
.write_attr('long_name', 'Latitude') \
.write_attr('valid_range', [-90.0, 90.0]) \
.write_data(lat)
obsspace.create_var('MetaData/Longitude', dtype=lon.dtype, fillval=lon.fill_value) \
.write_attr('units', 'degrees_east') \
.write_attr('long_name', 'Longitude') \
.write_attr('valid_range', [-180.0, 180.0]) \
.write_data(lon)
obsspace.create_var('ObsValue/brightnessTemperature', dtype=rad.dtype,
dim_list=['Location', 'Channel'], fillval=rad.fill_value) \
.write_attr('units', 'K') \
.write_attr('long_name', 'ATMS Observed (Uncorrected) Brightness Temperature') \
.write_attr('valid_range', [100.0, 500.0]) \
.write_data(rad)
Thank you, @rmclaren. I am referring Nick's example but it is for prebufr file. Your example may be more appropriate for me as a reference.
Please use import ioda_obs_space as ioda_ospace
instead of ioda directly as this will vastly simplify your code... I'm updating the example ioda-converters (bufr_query_python_to_ioda_test.py
) to use this as well.
@rmclaren, my bufr file has no second variable. It just includes year, month. days, hour, and minute, can I not use second or do I need to set second = 00 or something else? Thank you.
just leave off the seconds. should be optional
@rmclaren, on Hera, what python environment do you use? I used the python environment below:
setenv IODA_BUILD_DIR /scratch2/NCEPDEV/stmp3/Youlong.Xia/snowIodaBundle/ioda-bundle/build setenv IODA_CONV_DIR /scratch2/NCEPDEV/stmp3/Youlong.Xia/snowIodaBundle/ioda-bundle/iodaconv/src setenv python /scratch1/NCEPDEV/da/python/opt/core/miniconda3/4.6.14/envs/gdasapp/bin/python3.7 setenv IODAPY $IODA_BUILD_DIR/lib/python3.7/pyioda setenv BUFR $IODA_BUILD_DIR/lib/pyiodaconv setenv PYTHONPATH ${PYTHONPATH}:${IODA_CONV_DIR}:${IODAPY}:${BUFR}
But look it does not work. Thank you.
You might want to change the datetime section to be like the following:
obsspace.create_var('MetaData/Timestamp', dtype=np.int64, fillval=datetime.fill_value.astype('int64')) \
.write_attr('units', 'seconds_since_epoch') \
.write_attr('long_name', 'Timestamp') \
.write_data(datetime.astype('int64'))
Do you see any error messages after you run the updated code?
Location = 118 ;
5c5 < int64 Location(Location) ;
int Location(Location) ;
11,12d10 < string :MyGlobal_str = "My Global String Data" ; < :MyGlobal_int = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ; 24,25c22 < dateTime:_FillValue = -9223372036854775806LL ; < string dateTime:units = "seconds since 1970-01-01T00:00:00Z" ;
dateTime:_FillValue = 9223372036854775807LL ;
26a24 string dateTime:units = "seconds since 1970-01-01T00:00:00Z" ; 29d26 < string latitude:units = "degrees_north" ; 30a28 string latitude:units = "degrees_north" ; 34d31 < string longitude:units = "degrees_east" ; 35a33 string longitude:units = "degrees_east" ; 39d36 < string stationElevation:units = "m" ; 40a38,39 string stationElevation:units = "m" ; string stationElevation:coordinates = "longitude latitude" ; 43c42 < string stationIdentification:long_name = "Long Station or Site N ame" ;
string stationIdentification:long_name = "WIGOS Local Identifier
" ; 142d140 < string totalSnowDepth:units = "m" ; 143a142,143 string totalSnowDepth:units = "m" ; string totalSnowDepth:coordinates = "longitude latitude" ;
But the data are same. This means "BERN / Z" still exist for python API test.
So now that you have those strings in a python numpy array you can manipulate them any way you want..
For example you could do this:
sid = np.array([id.split('/')[0] for id in sid]) # get the part before the /
before this sentence, I print: print(sid.dtype) print(sid.fill_value)
it shows below: object
After this sentence, I print:
<U8
Traceback (most recent call last):
File "bufr_ncep_snocvr_api.py", line 92, in
I also print sid after this sentence, "/" is indeed removed but variable but variable characteristics seems to change.
@rmclaren, it looks to lead to the errors after I add your sentence although it indeed remove "/".
oops, ya that is not a masked array. So actually you might want to do this.
At the top:
import numpy as np
Then do this instead:
sid = np.ma.array([id.split('/')[0] for id in sid], mask=sid.mask, fill_value=sid.fill_value)
@rmclaren, after I do that, and the it leads the error below:
Traceback (most recent call last):
File "bufr_ncep_snocvr_api.py", line 93, in
Invoked with: <ioda.Has_Variables: [ Location ]>, 'MetaData/stationIdentification', None; kwargs: scales=[<ioda._ioda_python.Variable object at 0x7fd52dce1db0>], params=<ioda._ioda_python.VariableCreationParameters object at 0x7fd52dce1e70>
@rmclaren it seems to be when I use it directly, it works. I checked print(sid.dtype) and it is object. After I add that sentence, the sid.dtype becomes <U8, which is 8-characters string. The code does not allow to do something in python code??
I'm working on some deficiencies in recognizing strings properly in the ioda code (the current implementation is not very generic). Try this:
sid = sid.astype('S20')
I'm hoping this will result in a type of string that will be recognized properly..
@rmclaren, YES, IT WORKS. Thank you. This issue has been solved. I am following the suggestions from obsproc team scientists, Jeff and Iliana, read all four WGOS components and make a WSI such as 0-20000-0-82403 => WGOSIDS-WGOSISID-WGOSISNM -WGOSLID although I cannot read the last one. Mitch told us that it is 16-byte character string. I am wondering if the python API can only read 8-byte character string. I can start to look how to combine these four strings into a single string and then output it.
I'll have to look into... I don't beleive I support reading strings that long right now.. limited to 8 byte strings I believe. Not sure how they encode 16-byte strings in BUFR.
Write an issue please
Will do soon and let you know. Thank you @BenjaminRuston
Oops, Thank you @rmclaren :)
@YoulongXia-NOAA think this work is underway, or complete these are files on the GTS but not NOMADS correct, would be good to get a few full volume samples as well and these started Nov2022 or something recent as well correct
@BenjaminRuston, this is underway. As python API cannot handle long string WGOSLID well (Ron is working on that), and I want to use LSTN instead to represents stationIdentification, the users want to use the four WIGOS components as station Identification instead. We still need to consider the name definitions for the four WIGOS components. I will keep you updated.
@BenjaminRuston, I used 1 May 2023 data produced from EMC obsproc team as it just was produced from April 2023.
@BenjaminRuston, full volume samples are in /scratch2/NCEPDEV/marineda/Youlong.Xia/save/bufr_snowCover_GTS/gdas.20230501 and /scratch2/NCEPDEV/marineda/Youlong.Xia/save/bufr_snowCover_GTS/gdas.20230502 on Hera. Each day has 4 cycle and each cycle has a file. They are uploaded from wcoss2 obsproc directory to hera.
@gibbsp did YoulongXia get removed?
@YoulongXia-NOAA there you are... these files aren't available from NOMADS correct and they began around Nov2022 is that correct?
yes should try a few files and get this moved forward,, the data should be in the ADPSFC file soon is the idea correct?
@BenjaminRuston, from Nov20022, it includes only several European countries. From the end of April 2023, it includes both MADIS and those European countries. MADIS includes US and Canada stations. For some bufr data variable names, it seems to change. I have put many example data files in /scratch2/NCEPDEV/marineda/Youlong.Xia/save/bufr_snowCover_GTS/ on hera for various status when the data are requested to check. It includes several data file examples for May 2023. I can get the ops data from WCOSS2. See https://nomads.ncep.noaa.gov/pub/data/nccf/com/obsproc/prod/gdas.20231001/, e.g., gdas.t00z.snocvr.tm00.bufr_d. It is different from gdas.t00z.1bmhs.tm00.bufr_d, which contains GTS snow depth. I am not sure if obsproc team will combine them to one file or our ioda-converter needs to combine two files into single one. Now they will be process separately.
great thanks for the information following except think you mean:
gdas.t00z.adpsfc.tm00.bufr_d.nr
and then the other is the *.snocvr.*
thought I heard from someone maybe @ClaraDraper-NOAA that these would be merged into a single file and into the adpsfc
most likely at some point
@BenjaminRuston These files are all EMC-related, so it wasn't me that told you that.
@BenjaminRuston @ClaraDraper-NOAA, as @ilianagenkova is in this list and she is an obsproc team lead, she can give us any update if these two bufr files related to snow depth will be combined or keep them separately. As snow depth is one of many variables in adpsfc bufr file, I am not sure if they can easily be merged in. But for now, I process them as two files separately. If you want to combine them into a ioda file in the future, python API can do that.
yeah would agree, we do have some ability to detect duplicates across observation spaces this allows to leave them separate and maybe prioritize one over the other
@YoulongXia-NOAA and @ClaraDraper-NOAA we should add a ctest for this data type?
has someone already started this if not it looks striaghtforward there's already a yaml to test was there a PR for this
@YoulongXia-NOAA I'm not sure which data set this is.
I thought @jiaruidong2017 has this in,, it's a new data feed of station snow depth that comes in a separate BUFR file from the ones contained in the ADPSFC. And example is this:
https://ftpprd.ncep.noaa.gov/data/nccf/com/obsproc/prod/gdas.20231010/gdas.t00z.snocvr.tm00.bufr_d
it's supposedly on GTS it's not being archived in the NCAR RDA which is inconvenient for me, and it only started I always forget Nov2022 or maybe sprint 2023 or something like that
@BenjaminRuston @ClaraDraper-NOAA, it seems not ready for a test. The reason is that this bufr data is missing RPID for US data but use LSTN instead. When yaml file uses LSTN, the station ID will contains some "/" or ",". Certainly I can use python API to cut string to keep the first part before "/" or ",". But I am not sure if this is what you want. As I did not get any guidance for such a case, I did not to create PR and test. Do you have any suggestions?
@BenjaminRuston , additionally, @jiaruidong2017 and I am discussing this dataset with the EMC obsproc team to see if they can add station ID as the original netcdf data has station ID. But I am still not sure how to move forward to test this dataset.
@ClaraDraper-NOAA, as you may know, adpsfc bufr data (GTS) has very sparse station snow depth observations in US and Canada. The snocvr includes snow depth observations from several European countries which is not included in adpsfc bufr data and several thousands stations in US and Canada (from MADIS including SNOTEL and other US station observations). This dataset is being produced by EMC obs proc team as an ope4rtaional dataset. It is an addition for DA work operationally, in particular for north America and several north European countries. This is what I know.
Thanks @YoulongXia-NOAA to provide the detailed information for the new GTS based snow depth data. Yes, it is not ready yet to add a ctest for this data because of the missing station IDs from the US data.
if you want to put in the python api solution for LSTN, I will not stop you but you say the station ID as original will not have this problem, just update then?
@BenjaminRuston, this is the message from EMC obsproc team: "Some additional information regarding USA LSTNs. As you stated for US snow data, the "LSTN values look more like code for a station, not like a station name". The advantage of these LSTNs is that they are short and can be "shoe-horned" if necessary into WGOSLID". I will wait for their further update for this dateset.
can you provide what you have particularly for the python API? I would think it's worth the effort to start looking at the data?
@BenjaminRuston, bufr_ncep_snocvr_api.py in /scratch1/NCEPDEV/da/Youlong.Xia/test/pythonAPI on Hera. The output is sizeReduced_gdas.t06z.snocvr.tm00.bufr_d.nc in /scratch1/NCEPDEV/da/Youlong.Xia/test/pythonAPI/testrun.
A python API converter is working within GDASApp - https://github.com/NOAA-EMC/GDASApp/pull/848
Description
NCEP/EMC obsproc team produced operational snow product named as gdas.tXXz.snocvr.tmXX.bufr_d (XX is 00, 06, 12, 18 Z cycle). The station-based includes some European countries and US and Canada mainly from MADIS. These bufr snow data are supporting the community land snow DA work. A yaml file needs to be created so that bufr2ioda.x can be used to create ioda v3 NetCDF4 file. The created data needs to be checked against original bufr data to ensure that the data conversion is correct. The ctest needs to be passed and yaml file need to be merged into ioda-converter develop branch.
Requirements
The yaml file needs to be created and tested.
Acceptance Criteria (Definition of Done)
Ctest will pass within ioda-bundle and yaml file will be merged into ioda-converter develop branch
Dependencies
No