JCSDA-internal / ioda-converters

Various converters for getting obs data in and out of IODA
9 stars 4 forks source link

Update current imsfv3 converter to handle time dimension #1510

Closed YoulongXia-NOAA closed 4 months ago

YoulongXia-NOAA commented 5 months ago

Description

This PR mainly to solve two problems: (1) update current imsfv3 converter snow depth units from m into mm, utc time from 18Z into 00z and error is set as 40. mm and (2) add imsfv3grid converter to fix the issue (https://github.com/JCSDA-internal/ioda-converters/issues/1497) to read time directly from NetCDF file.

Issue(s) addressed

Resolves #1497

Dependencies

List the other PRs that this PR is dependent on: NO

Impact

Expected impact on downstream repositories: NO

Checklist

YoulongXia-NOAA commented 5 months ago

I created a feature branch, on Hera, I tested the two converters and they passed. When website checks is done, please take a look. Thank you.

YoulongXia-NOAA commented 5 months ago

Okay, website check has passed. Please let me know your comments when your time is available. Thank you.

BenjaminRuston commented 5 months ago

@YoulongXia-NOAA great, I've wanted to get back to this. And we should finish updating the units in the other converters as well.

We need to merge these two converters imsfv3_scf2ioda.py and imsfv3grid_scf2ioda.py I do not see anything that would prevent this and adding nearly duplicate routines makes for difficult maintenance

(venv) JCSDA-L-17733: /land >diff -dwib imsfv3_scf2ioda.py imsfv3grid_scf2ioda.py
18a19
> from pyiodaconv.def_jedi_utils import iso8601_string
83a85
>         time = ncd.variables['time'][:].ravel()
89a92
>         time = time.astype('int64')
98,104c101
<         times = np.empty_like(sncv, dtype=object)
< 
<         # get datetime from filename
<         str_date = re.search(r'\d{8}', self.filename).group()
<         my_date = datetime.strptime(str_date, "%Y%m%d")
<         start_datetime = my_date.strftime('%Y-%m-%d')
<         base_datetime = start_datetime + 'T00:00:00Z'
---
>         times = np.empty_like(sncv, dtype='int64')
107c104
<             times[i] = base_datetime
---
>             times[i] = time
110a108
>         self.varAttrs[('dateTime', 'MetaData')]['units'] = iso8601_string
YoulongXia-NOAA commented 5 months ago

@BenjaminRuston, originally I want to keep the last one. However, this will need to wait for land-ims-proc code PR (https://github.com/NOAA-PSL/land-IMS_proc/pull/5) is merged. Otherwise, the second converter will not work as no time is added into NetCDF file. Therefore, during transition period, I suggested to keep the two converters to avoid some broken from applications. When land-ims-proc is updated and merged into land DA workflow, we can easily remove the first one and the corresponding tests and just keep one. This the reason why I want to keep two in this time.

BenjaminRuston commented 5 months ago

@CoryMartin-NOAA and @YoulongXia-NOAA OK,, think I'm finally finally done, did a bit more cleanup

please test imsfv3_scf2ioda.py and we will move to remove the new file imsfv3grid_scf2ioda.py that was largely a duplicate

@YoulongXia-NOAA also notice this is using height and not stationElevation should we make this change now?

BenjaminRuston commented 5 months ago

and looks like I'll need to update this file in testoutput/imsfv3_scf.nc

and we'll want to confirm it is how we want it to look regarding the values themselves... we want to leave the missing values as -999:

156: DIFFER : VARIABLE : totalSnowDepth : POSITION : [2084] : VALUES : -0.999 <> -999 : PERCENT : 99.9
156: DIFFER : VARIABLE : totalSnowDepth : POSITION : [2085] : VALUES : -0.999 <> -999 : PERCENT : 99.9

and make sure the totalSnowDepth/ObsError is what we want as well

YoulongXia-NOAA commented 5 months ago

totalSnowDepth/ObsError =40.0 mm now @ClaraDraper-NOAA used for IMS snow depth. See if se has new input.

BenjaminRuston commented 5 months ago

looks like the testoutput was updated for the old converter as well the only difference I get now is due to my initialization:

DIFFER : VARIABLE : dateTime : ATTRIBUTE : _FillValue : VALUES : 7258118400 <> -9223372036854775806
BenjaminRuston commented 5 months ago

ok,, I've got the CI testing reproducing what I"m seeing locally. I will push up the updated file that now sets the dateTime fillValue attribute to the value that is being used

YoulongXia-NOAA commented 5 months ago

@BenjaminRuston, please wait for a bit time for my land-ims-proc PR review to avoid some back-forth works. Based the reviewer's comments, using a time calculation subroutine from ufs-land-driver may not be optimal as that code is not from the public time library but written by ufs-land-driver developer. He suggested me to output date into netcdf file, and then let python to convert dateTime into seconds. I modified land-ims-proc code again to add a global attribute as valia_date (removed all previous modifications), which is a minor modification for the code. It works and I am waiting for the reviewers' approval. When that PR was approved, I will put the input example date into Orion and then you can test this new data. Ioda converter also needs minor modification as

str_date =ncd.getncattr('valid_date')

I tested offline converter to produce the output data, and use nccmp to compare with the output with ioda output using previously modified ioda converter and input data produced from land-ims-proc by adding time, they has no difference. Therefore, your output data is OKAY. Only needs to slightly modify converter and use new input data.

YoulongXia-NOAA commented 5 months ago

The input data format is the same as old input data but only add:

// global attributes: :valid_date = "20191201" ;

YoulongXia-NOAA commented 5 months ago

Nevertheless, I copy my new input data and tested ioda converter to oirion: in /work/noaa/da/youlong/Hera_dataFile. They are imsfv3_scf2ioda_globalAttribute.py and IMSscf.20191201.C96.nc. Thank you.

BenjaminRuston commented 5 months ago

last thing I'd like to address here is this is a lot of file for a ctest... total is close to 1Mb think will see if can sample it down by a factor of 10 or so, if it's simple will do this and likely rename files to reflect they are additional sample.

note that I will try to execute via ncks so the reduction will be reproducible, and documentable ;0)

BenjaminRuston commented 5 months ago

@YoulongXia-NOAA we can move foward with this as the get_observation_time is easily changed later. I have no idea why the ufs-land-driver reference has any relevance. If there is a date in this attribute:

str_date =ncd.getncattr('valid_date')

we can add it as an option in the get_observation_time easily. Will be good to discuss with Clara

YoulongXia-NOAA commented 5 months ago

@YoulongXia-NOAA we can move foward with this as the get_observation_time is easily changed later. I have no idea why the ufs-land-driver reference has any relevance. If there is a date in this attribute:

str_date =ncd.getncattr('valid_date')

we can add it as an option in the get_observation_time easily. Will be good to discuss with Clara

Yes, go ahead. Thank you.

BenjaminRuston commented 4 months ago

here we are waiting for new input files that have valid_date as an attribute as well as a reduction to c48

will adapt decoder to use the attribute for date rather than a variable, update the test reference and should be done

BenjaminRuston commented 4 months ago

@ClaraDraper-NOAA any chance to get to creation of new test file input

ClaraDraper-NOAA commented 4 months ago

@BenjaminRuston - I just tried to create this, but realized that we don't quite have the right time variable in the file. @YoulongXia-NOAA is going to update the processing routine to fix this in the land-IMSproc repo. The new time variable will be in seconds since a reference date (1 Jan 1970), so we'll also need to change this converter to handle the different format.

YoulongXia-NOAA commented 4 months ago

@BenjaminRuston, @ClaraDraper-NOAA is asking me to put time when reading from nc file and keep the current format when reading from ascii file. This may has different outputs when reading nc file (add time dimension) and ascii file (keep current 1D only locations) in land-ims-proc. Your ioda converter may need to read time first, and global attribute second, and then cute from filename.

ClaraDraper-NOAA commented 4 months ago

@YoulongXia-NOAA It hadn't occurred to me that this would create different time formats for the output from land-IMSproc - good catch. @BenjaminRuston - let's talk about this quickly when we meet on Wednesday.

ClaraDraper-NOAA commented 4 months ago

@YoulongXia-NOAA @BenjaminRuston

On second thought: I just looked at Ben's routine to extract the time from the input file. It assumes either:

i) time is a variable (not an attribute) that is an integer in seconds since 1 Jan 1970, or ii) there is no time variable in the file, in which case the time is taken from the filename.

For netcdf IMS input: Currently land-IMSproc, which outputs the file read in by this converter, writes out time as an attribute and as a YYYYMMDDHH string. However, I've already asked Youlong to change this to an integer (seconds since 1 Jan 1970) to match the input netcdf IMS files. In addition to changing the format, we then also need to decide whether to make time an attribute or a variable, and then change either this code, or land-IMSproc, depending on which we chose. Ben - let's discuss quickly on Wednesday.

For the ascii input files, Youlong - can you please change land-IMSproc to not add the time variable/attribute to the output file. Ben's code will then simply pull the time from the filename, and we can avoid the issue of having time variables in two different formats.

YoulongXia-NOAA commented 4 months ago

@YoulongXia-NOAA @BenjaminRuston

On second thought: I just looked at Ben's routine to extract the time from the input file. It assumes either:

i) time is a variable (not an attribute) that is an integer in seconds since 1 Jan 1970, or ii) there is no time variable in the file, in which case the time is taken from the filename.

For netcdf IMS input: Currently land-IMSproc, which outputs the file read in by this converter, writes out time as an attribute and as a YYYYMMDDHH string. However, I've already asked Youlong to change this to an integer (seconds since 1 Jan 1970) to match the input netcdf IMS files. In addition to changing the format, we then also need to decide whether to make time an attribute or a variable, and then change either this code, or land-IMSproc, depending on which we chose. Ben - let's discuss quickly on Wednesday.

For the ascii input files, Youlong - can you please change land-IMSproc to not add the time variable/attribute to the output file. Ben's code will then simply pull the time from the filename, and we can avoid the issue of having time variables in two different formats.

@ClaraDraper-NOAA, I have to put time global attribute in nc file to solve the issue @CoryMartin-NOAA raised (see https://github.com/JCSDA-internal/ioda-converters/issues/1497). This is a reason why I worked land-ims-proc code to put time into netcdf file.

ClaraDraper-NOAA commented 4 months ago

@BenjaminRuston Please see above. It looks like your solution of looking in the filename if we don't have a time variable present doesn't really solve the original issue that Cory raised (which was a desire not to use the filename). It's a bit clunky, but my suggestion is that we have 2 possible time variables/attributes in the output of the land-IMSproc code: 1) time as an integer, if the original input file was a netcdf. This can be called "time" 2) time as a string, if the original input file was an ascii. This can be called "time_str"

Then this code looks for whichever is present. @CoryMartin-NOAA Will we be assimilating ascii operationally? If so 2) should be the default that the converter looks for first.

CoryMartin-NOAA commented 4 months ago

My understanding is the file that is already available in our obs processing system on WCOSS is the ASCII IMS file, but it's possible we could switch to the netCDF version before implementation if necessary.

YoulongXia-NOAA commented 4 months ago

@ClaraDraper-NOAA, @BenjaminRuston, @CoryMartin-NOAA, I suggested to output a global attribute and a time/time_str variable into netcdf file. The ioda-converter can read time_str for reading ascii file and time for reading nc file. The global attribute is used as a backup. The ioda-conveerter can easily use to get time or time_str to handle the file. If we output two time variables in netcdf file, for reading nc file case, if use time_str as a default, time will not be used.

YoulongXia-NOAA commented 4 months ago

@BenjaminRuston Please see above. It looks like your solution of looking in the filename if we don't have a time variable present doesn't really solve the original issue that Cory raised (which was a desire not to use the filename). It's a bit clunky, but my suggestion is that we have 2 possible time variables/attributes in the output of the land-IMSproc code:

  1. time as an integer, if the original input file was a netcdf. This can be called "time"
  2. time as a string, if the original input file was an ascii. This can be called "time_str"

Then this code looks for whichever is present. @CoryMartin-NOAA Will we be assimilating ascii operationally? If so 2) should be the default that the converter looks for first.

For time_str, you suggested to put YYYYMMMDDHH when ascii file is read. Do you have any suggestion for time_str attributes such as units, comment or something else?

ClaraDraper-NOAA commented 4 months ago

@BenjaminRuston Are you OK with Youlong's suggestion?

I've just learned that if you click on a function in github, it shows you the original version in the PR, not the updated version (grrr...) - so I see now that you are already reading time from an attribute, not a variable.

Following Youlong's suggestion, we'll need to update this converter to: 1) Try to read in a string attribute with the date-time. 2) If that fails, try to read in an integer attribute (the option you have now).

ClaraDraper-NOAA commented 4 months ago

@BenjaminRuston Please see above. It looks like your solution of looking in the filename if we don't have a time variable present doesn't really solve the original issue that Cory raised (which was a desire not to use the filename). It's a bit clunky, but my suggestion is that we have 2 possible time variables/attributes in the output of the land-IMSproc code:

  1. time as an integer, if the original input file was a netcdf. This can be called "time"
  2. time as a string, if the original input file was an ascii. This can be called "time_str"

Then this code looks for whichever is present. @CoryMartin-NOAA Will we be assimilating ascii operationally? If so 2) should be the default that the converter looks for first.

For time_str, you suggested to put YYYYMMMDDHH when ascii file is read. Do you have any suggestion for time_str attributes such as units, comment or something else?

@YoulongXia-NOAA Please just make sure that there is enough information in the long name, etc so that people can interpret the date. Y

BenjaminRuston commented 4 months ago

@ClaraDraper-NOAA and @YoulongXia-NOAA sure we can add, but I'm having trouble following on github so glad we are meeting.

we want to read an attribute .. if integer genrate time, if string generate time no need to variable that holds time (that is awkward, but harmless to leave)

and you want to remove the fall fall back to use anything from the filename, realize this is not optimal and can remove easily but believe we have a file in a ctest that would need to be replaced

YoulongXia-NOAA commented 4 months ago

I am still struggling land-ims-proc code to output NCWCP file. I am totally lost and do not know how to move forward.

================================================================ Youlong Xia, Ph.D SAIC at EMC/NCEP 5830 University Research Court College Park, MD 20740 Phone: 301-683-3696

On Tue, May 21, 2024 at 9:07 PM Benjamin Ruston @.***> wrote:

@ClaraDraper-NOAA https://github.com/ClaraDraper-NOAA and @YoulongXia-NOAA https://github.com/YoulongXia-NOAA sure we can add, but I'm having trouble following on github so glad we are meeting.

we want to read an attribute .. if integer genrate time, if string generate time no need to variable that holds time (that is awkward, but harmless to leave)

and you want to remove the fall fall back to use anything from the filename, realize this is not optimal and can remove easily but believe we have a file in a ctest that would need to be replaced

— Reply to this email directly, view it on GitHub https://github.com/JCSDA-internal/ioda-converters/pull/1510#issuecomment-2123691071, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMKGNUFDRT64LYZFACCWV6DZDPVUFAVCNFSM6AAAAABHCGXJ6WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRTGY4TCMBXGE . You are receiving this because you were mentioned.Message ID: @.***>

YoulongXia-NOAA commented 4 months ago

@BenjaminRuston, after I got more explanations for land-ims-proc, I move that PR forward with using global attributes as I worked that code in the first time. But reviewers suggested to use valid_time_str = "2019120100" for ascii file input and valid_epoch_time = 1575244800 for netcdf file input. The reviewers are looking at this change and see if they have further comments. Thank you for your patience.

BenjaminRuston commented 4 months ago

excellent got the files @ClaraDraper-NOAA

will add these to the ctests and have capability to do time from string attribute:

// global attributes:
        :valid_time_str = "2023050112" ;

or integer epoch attribute:

// global attributes:
        :valid_epoch_time = 1682899200 ;

this is straightforward, and will remove anything from filename as if it's netCDF expect to have this attribute, if not present will through an error (exception) and exit

BenjaminRuston commented 4 months ago

@ClaraDraper-NOAA I've finished adding the new files, modifying the converter and changing the CMakeLists.txt to point to the new files.

Note these two input files have different valid_times.

The testinput/ascii_input_IMSscf.20230501.oro_C48.nc file that uses the valid_time_str is 12Z: ncd.valid_time_str="2023050112"

file: imsfv3_scf.nc
range(/MetaData/dateTime): [2023-05-01 12:00:00, 2023-05-01 12:00:00]

The testinput/nc_input_IMSscf.20230501.oro_C48.nc has a ncd.valid_epoch_time=1682899200 the corresponds to 00Z:

file: imsfv3grid_scf.nc
range(/MetaData/dateTime): [2023-05-01 00:00:00, 2023-05-01 00:00:00]
ClaraDraper-NOAA commented 4 months ago

@BenjaminRuston They should both be at 00. Heading out the door now, but I'll sort it out first thing tomorrow.

BenjaminRuston commented 4 months ago

thanks @ClaraDraper-NOAA think that cleans this one up, @YoulongXia-NOAA let me know if you or Clara see anything,,, again think we can finally close this one out

YoulongXia-NOAA commented 4 months ago

@BenjaminRuston, I look at your changes. You seem to use two ctests - imsfv3_scf and imsfv3grid_scf to identify ascii_input and nc_input. The input has a clear indication for ascii and nc but output is no identifier. This fine but the README.md needs to be updated to explain what they mean. Another option is if an identifier is used and let the users to understand what you did here, e.g., asciiInpu, ncInput to ctest names and output. You can consider which option is better.

BenjaminRuston commented 4 months ago

I'm not sure of the ask here @YoulongXia-NOAA can wait until we can discuss

but these two input files do produce the same output actually, and not sure why that's wrong or a problem?

a nccmp of the testoutput files says they are the same

nccmp imsfv3_scf.nc imsfv3grid_scf.nc  -d -m -g -f -S -T 5.0e-4
YoulongXia-NOAA commented 4 months ago

@BenjaminRuston, in this way, you can use one name for testout? I will wait for ....

BenjaminRuston commented 4 months ago

@BenjaminRuston, in this way, you can use one name for testout? I will wait for ....

you could, but would not advise using the same name for the files from the two different inputs. a reason why it's important to keep these minimally sized

PatNichols commented 4 months ago

I'm not sure of the ask here @YoulongXia-NOAA can wait until we can discuss

but these two input files do produce the same output actually, and not sure why that's wrong or a problem?

a nccmp of the testoutput files says they are the same

nccmp imsfv3_scf.nc imsfv3grid_scf.nc  -d -m -g -f -S -T 5.0e-4

@BenjaminRuston Even if they are identical, if the ctest are run in parallel they might be writing to the same file at the same time. If this is the case, then you would get trash in the output file.

BenjaminRuston commented 4 months ago

@gthompsnJCSDA and @PatNichols this would be creating testRun output files, but would both be comparing these to the same reference file in testoutput

BenjaminRuston commented 4 months ago

first looking into the two ctests (that put output into testrun) being compared against the same testoutput file.

This would require a change to the iodaconv_comp.sh and call to nccmp think this is out of scope and prefer to leave this distinct for now

https://github.com/JCSDA-internal/ioda-converters/blob/3c1ab0aa2bb2393608666a906bdfe509f1602c83/tools/iodaconv_comp.sh#L25

BenjaminRuston commented 4 months ago

@YoulongXia-NOAA the other item was a potential change to the README.md : https://github.com/JCSDA-internal/ioda-converters/blob/develop/src/land/README.md

I'm not sure of the change you wanted to make there. @ClaraDraper-NOAA do you have further instruction. This is very close if not ready just want to finish this up please pass along what you think may still be needed.

ClaraDraper-NOAA commented 4 months ago

Yes, that README does need to be updated. It covers only the old SNODAS-related converters. If you want to take a first go at it @BenjaminRuston I can fill in any missing details about the usage/relevance of the specific converters.

personally think it would be better to put the README changes in a separate issue.

the work is done for this particular issue ... and we should open a new issue to update the README from specific to SNODAS converters to what is being used?

BenjaminRuston commented 4 months ago

whoops,, I didn't realize it let's me edit your comments @ClaraDraper-NOAA !!!!

anyhoo, please see above should we close this and open a new issue to update documentation and the README?

ClaraDraper-NOAA commented 4 months ago

whoops,, I didn't realize it let's me edit your comments @ClaraDraper-NOAA !!!!

anyhoo, please see above should we close this and open a new issue to update documentation and the README?

I like the idea of closing this, then updating the README in a separate issue.

BenjaminRuston commented 4 months ago

great,, I'm approving @ClaraDraper-NOAA we need one more approval and can move this forward