Closed YoulongXia-NOAA closed 4 months ago
I created a feature branch, on Hera, I tested the two converters and they passed. When website checks is done, please take a look. Thank you.
Okay, website check has passed. Please let me know your comments when your time is available. Thank you.
@YoulongXia-NOAA great, I've wanted to get back to this. And we should finish updating the units in the other converters as well.
We need to merge these two converters imsfv3_scf2ioda.py
and imsfv3grid_scf2ioda.py
I do not see anything that would prevent this and adding nearly duplicate routines makes for difficult maintenance
(venv) JCSDA-L-17733: /land >diff -dwib imsfv3_scf2ioda.py imsfv3grid_scf2ioda.py
18a19
> from pyiodaconv.def_jedi_utils import iso8601_string
83a85
> time = ncd.variables['time'][:].ravel()
89a92
> time = time.astype('int64')
98,104c101
< times = np.empty_like(sncv, dtype=object)
<
< # get datetime from filename
< str_date = re.search(r'\d{8}', self.filename).group()
< my_date = datetime.strptime(str_date, "%Y%m%d")
< start_datetime = my_date.strftime('%Y-%m-%d')
< base_datetime = start_datetime + 'T00:00:00Z'
---
> times = np.empty_like(sncv, dtype='int64')
107c104
< times[i] = base_datetime
---
> times[i] = time
110a108
> self.varAttrs[('dateTime', 'MetaData')]['units'] = iso8601_string
@BenjaminRuston, originally I want to keep the last one. However, this will need to wait for land-ims-proc code PR (https://github.com/NOAA-PSL/land-IMS_proc/pull/5) is merged. Otherwise, the second converter will not work as no time is added into NetCDF file. Therefore, during transition period, I suggested to keep the two converters to avoid some broken from applications. When land-ims-proc is updated and merged into land DA workflow, we can easily remove the first one and the corresponding tests and just keep one. This the reason why I want to keep two in this time.
@CoryMartin-NOAA and @YoulongXia-NOAA OK,, think I'm finally finally done, did a bit more cleanup
please test imsfv3_scf2ioda.py
and we will move to remove the new file imsfv3grid_scf2ioda.py
that was largely a duplicate
@YoulongXia-NOAA also notice this is using height
and not stationElevation
should we make this change now?
and looks like I'll need to update this file in testoutput/imsfv3_scf.nc
and we'll want to confirm it is how we want it to look regarding the values themselves... we want to leave the missing values as -999
:
156: DIFFER : VARIABLE : totalSnowDepth : POSITION : [2084] : VALUES : -0.999 <> -999 : PERCENT : 99.9
156: DIFFER : VARIABLE : totalSnowDepth : POSITION : [2085] : VALUES : -0.999 <> -999 : PERCENT : 99.9
and make sure the totalSnowDepth/ObsError
is what we want as well
totalSnowDepth/ObsError =40.0 mm now @ClaraDraper-NOAA used for IMS snow depth. See if se has new input.
looks like the testoutput
was updated for the old converter as well the only difference I get now is due to my initialization:
DIFFER : VARIABLE : dateTime : ATTRIBUTE : _FillValue : VALUES : 7258118400 <> -9223372036854775806
ok,, I've got the CI testing reproducing what I"m seeing locally. I will push up the updated file that now sets the dateTime fillValue attribute to the value that is being used
@BenjaminRuston, please wait for a bit time for my land-ims-proc PR review to avoid some back-forth works. Based the reviewer's comments, using a time calculation subroutine from ufs-land-driver may not be optimal as that code is not from the public time library but written by ufs-land-driver developer. He suggested me to output date into netcdf file, and then let python to convert dateTime into seconds. I modified land-ims-proc code again to add a global attribute as valia_date (removed all previous modifications), which is a minor modification for the code. It works and I am waiting for the reviewers' approval. When that PR was approved, I will put the input example date into Orion and then you can test this new data. Ioda converter also needs minor modification as
str_date =ncd.getncattr('valid_date')
I tested offline converter to produce the output data, and use nccmp to compare with the output with ioda output using previously modified ioda converter and input data produced from land-ims-proc by adding time, they has no difference. Therefore, your output data is OKAY. Only needs to slightly modify converter and use new input data.
The input data format is the same as old input data but only add:
// global attributes: :valid_date = "20191201" ;
Nevertheless, I copy my new input data and tested ioda converter to oirion: in /work/noaa/da/youlong/Hera_dataFile. They are imsfv3_scf2ioda_globalAttribute.py and IMSscf.20191201.C96.nc. Thank you.
last thing I'd like to address here is this is a lot of file for a ctest... total is close to 1Mb think will see if can sample it down by a factor of 10 or so, if it's simple will do this and likely rename files to reflect they are additional sample.
note that I will try to execute via ncks
so the reduction will be reproducible, and documentable ;0)
@YoulongXia-NOAA we can move foward with this as the get_observation_time
is easily changed later. I have no idea why the ufs-land-driver
reference has any relevance. If there is a date in this attribute:
str_date =ncd.getncattr('valid_date')
we can add it as an option in the get_observation_time
easily. Will be good to discuss with Clara
@YoulongXia-NOAA we can move foward with this as the
get_observation_time
is easily changed later. I have no idea why theufs-land-driver
reference has any relevance. If there is a date in this attribute:str_date =ncd.getncattr('valid_date')
we can add it as an option in the
get_observation_time
easily. Will be good to discuss with Clara
Yes, go ahead. Thank you.
here we are waiting for new input files that have valid_date
as an attribute as well as a reduction to c48
will adapt decoder to use the attribute for date rather than a variable, update the test reference and should be done
@ClaraDraper-NOAA any chance to get to creation of new test file input
@BenjaminRuston - I just tried to create this, but realized that we don't quite have the right time variable in the file. @YoulongXia-NOAA is going to update the processing routine to fix this in the land-IMSproc repo. The new time variable will be in seconds since a reference date (1 Jan 1970), so we'll also need to change this converter to handle the different format.
@BenjaminRuston, @ClaraDraper-NOAA is asking me to put time when reading from nc file and keep the current format when reading from ascii file. This may has different outputs when reading nc file (add time dimension) and ascii file (keep current 1D only locations) in land-ims-proc. Your ioda converter may need to read time first, and global attribute second, and then cute from filename.
@YoulongXia-NOAA It hadn't occurred to me that this would create different time formats for the output from land-IMSproc - good catch. @BenjaminRuston - let's talk about this quickly when we meet on Wednesday.
@YoulongXia-NOAA @BenjaminRuston
On second thought: I just looked at Ben's routine to extract the time from the input file. It assumes either:
i) time is a variable (not an attribute) that is an integer in seconds since 1 Jan 1970, or ii) there is no time variable in the file, in which case the time is taken from the filename.
For netcdf IMS input: Currently land-IMSproc, which outputs the file read in by this converter, writes out time as an attribute and as a YYYYMMDDHH string. However, I've already asked Youlong to change this to an integer (seconds since 1 Jan 1970) to match the input netcdf IMS files. In addition to changing the format, we then also need to decide whether to make time an attribute or a variable, and then change either this code, or land-IMSproc, depending on which we chose. Ben - let's discuss quickly on Wednesday.
For the ascii input files, Youlong - can you please change land-IMSproc to not add the time variable/attribute to the output file. Ben's code will then simply pull the time from the filename, and we can avoid the issue of having time variables in two different formats.
@YoulongXia-NOAA @BenjaminRuston
On second thought: I just looked at Ben's routine to extract the time from the input file. It assumes either:
i) time is a variable (not an attribute) that is an integer in seconds since 1 Jan 1970, or ii) there is no time variable in the file, in which case the time is taken from the filename.
For netcdf IMS input: Currently land-IMSproc, which outputs the file read in by this converter, writes out time as an attribute and as a YYYYMMDDHH string. However, I've already asked Youlong to change this to an integer (seconds since 1 Jan 1970) to match the input netcdf IMS files. In addition to changing the format, we then also need to decide whether to make time an attribute or a variable, and then change either this code, or land-IMSproc, depending on which we chose. Ben - let's discuss quickly on Wednesday.
For the ascii input files, Youlong - can you please change land-IMSproc to not add the time variable/attribute to the output file. Ben's code will then simply pull the time from the filename, and we can avoid the issue of having time variables in two different formats.
@ClaraDraper-NOAA, I have to put time global attribute in nc file to solve the issue @CoryMartin-NOAA raised (see https://github.com/JCSDA-internal/ioda-converters/issues/1497). This is a reason why I worked land-ims-proc code to put time into netcdf file.
@BenjaminRuston Please see above. It looks like your solution of looking in the filename if we don't have a time variable present doesn't really solve the original issue that Cory raised (which was a desire not to use the filename). It's a bit clunky, but my suggestion is that we have 2 possible time variables/attributes in the output of the land-IMSproc code: 1) time as an integer, if the original input file was a netcdf. This can be called "time" 2) time as a string, if the original input file was an ascii. This can be called "time_str"
Then this code looks for whichever is present. @CoryMartin-NOAA Will we be assimilating ascii operationally? If so 2) should be the default that the converter looks for first.
My understanding is the file that is already available in our obs processing system on WCOSS is the ASCII IMS file, but it's possible we could switch to the netCDF version before implementation if necessary.
@ClaraDraper-NOAA, @BenjaminRuston, @CoryMartin-NOAA, I suggested to output a global attribute and a time/time_str variable into netcdf file. The ioda-converter can read time_str for reading ascii file and time for reading nc file. The global attribute is used as a backup. The ioda-conveerter can easily use to get time or time_str to handle the file. If we output two time variables in netcdf file, for reading nc file case, if use time_str as a default, time will not be used.
@BenjaminRuston Please see above. It looks like your solution of looking in the filename if we don't have a time variable present doesn't really solve the original issue that Cory raised (which was a desire not to use the filename). It's a bit clunky, but my suggestion is that we have 2 possible time variables/attributes in the output of the land-IMSproc code:
- time as an integer, if the original input file was a netcdf. This can be called "time"
- time as a string, if the original input file was an ascii. This can be called "time_str"
Then this code looks for whichever is present. @CoryMartin-NOAA Will we be assimilating ascii operationally? If so 2) should be the default that the converter looks for first.
For time_str, you suggested to put YYYYMMMDDHH when ascii file is read. Do you have any suggestion for time_str attributes such as units, comment or something else?
@BenjaminRuston Are you OK with Youlong's suggestion?
I've just learned that if you click on a function in github, it shows you the original version in the PR, not the updated version (grrr...) - so I see now that you are already reading time from an attribute, not a variable.
Following Youlong's suggestion, we'll need to update this converter to: 1) Try to read in a string attribute with the date-time. 2) If that fails, try to read in an integer attribute (the option you have now).
@BenjaminRuston Please see above. It looks like your solution of looking in the filename if we don't have a time variable present doesn't really solve the original issue that Cory raised (which was a desire not to use the filename). It's a bit clunky, but my suggestion is that we have 2 possible time variables/attributes in the output of the land-IMSproc code:
- time as an integer, if the original input file was a netcdf. This can be called "time"
- time as a string, if the original input file was an ascii. This can be called "time_str"
Then this code looks for whichever is present. @CoryMartin-NOAA Will we be assimilating ascii operationally? If so 2) should be the default that the converter looks for first.
For time_str, you suggested to put YYYYMMMDDHH when ascii file is read. Do you have any suggestion for time_str attributes such as units, comment or something else?
@YoulongXia-NOAA Please just make sure that there is enough information in the long name, etc so that people can interpret the date. Y
@ClaraDraper-NOAA and @YoulongXia-NOAA sure we can add, but I'm having trouble following on github so glad we are meeting.
we want to read an attribute .. if integer genrate time, if string generate time no need to variable that holds time (that is awkward, but harmless to leave)
and you want to remove the fall fall back to use anything from the filename, realize this is not optimal and can remove easily but believe we have a file in a ctest that would need to be replaced
I am still struggling land-ims-proc code to output NCWCP file. I am totally lost and do not know how to move forward.
On Tue, May 21, 2024 at 9:07 PM Benjamin Ruston @.***> wrote:
@ClaraDraper-NOAA https://github.com/ClaraDraper-NOAA and @YoulongXia-NOAA https://github.com/YoulongXia-NOAA sure we can add, but I'm having trouble following on github so glad we are meeting.
we want to read an attribute .. if integer genrate time, if string generate time no need to variable that holds time (that is awkward, but harmless to leave)
and you want to remove the fall fall back to use anything from the filename, realize this is not optimal and can remove easily but believe we have a file in a ctest that would need to be replaced
— Reply to this email directly, view it on GitHub https://github.com/JCSDA-internal/ioda-converters/pull/1510#issuecomment-2123691071, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMKGNUFDRT64LYZFACCWV6DZDPVUFAVCNFSM6AAAAABHCGXJ6WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRTGY4TCMBXGE . You are receiving this because you were mentioned.Message ID: @.***>
@BenjaminRuston, after I got more explanations for land-ims-proc, I move that PR forward with using global attributes as I worked that code in the first time. But reviewers suggested to use valid_time_str = "2019120100" for ascii file input and valid_epoch_time = 1575244800 for netcdf file input. The reviewers are looking at this change and see if they have further comments. Thank you for your patience.
excellent got the files @ClaraDraper-NOAA
will add these to the ctests and have capability to do time from string attribute:
// global attributes:
:valid_time_str = "2023050112" ;
or integer epoch attribute:
// global attributes:
:valid_epoch_time = 1682899200 ;
this is straightforward, and will remove anything from filename as if it's netCDF expect to have this attribute, if not present will through an error (exception) and exit
@ClaraDraper-NOAA I've finished adding the new files, modifying the converter and changing the CMakeLists.txt to point to the new files.
Note these two input files have different valid_times
.
The testinput/ascii_input_IMSscf.20230501.oro_C48.nc
file that uses the valid_time_str
is 12Z: ncd.valid_time_str="2023050112"
file: imsfv3_scf.nc
range(/MetaData/dateTime): [2023-05-01 12:00:00, 2023-05-01 12:00:00]
The testinput/nc_input_IMSscf.20230501.oro_C48.nc
has a ncd.valid_epoch_time=1682899200
the corresponds to 00Z:
file: imsfv3grid_scf.nc
range(/MetaData/dateTime): [2023-05-01 00:00:00, 2023-05-01 00:00:00]
@BenjaminRuston They should both be at 00. Heading out the door now, but I'll sort it out first thing tomorrow.
thanks @ClaraDraper-NOAA think that cleans this one up, @YoulongXia-NOAA let me know if you or Clara see anything,,, again think we can finally close this one out
@BenjaminRuston, I look at your changes. You seem to use two ctests - imsfv3_scf and imsfv3grid_scf to identify ascii_input and nc_input. The input has a clear indication for ascii and nc but output is no identifier. This fine but the README.md needs to be updated to explain what they mean. Another option is if an identifier is used and let the users to understand what you did here, e.g., asciiInpu, ncInput to ctest names and output. You can consider which option is better.
I'm not sure of the ask here @YoulongXia-NOAA can wait until we can discuss
but these two input files do produce the same output actually, and not sure why that's wrong or a problem?
a nccmp
of the testoutput files says they are the same
nccmp imsfv3_scf.nc imsfv3grid_scf.nc -d -m -g -f -S -T 5.0e-4
@BenjaminRuston, in this way, you can use one name for testout? I will wait for ....
@BenjaminRuston, in this way, you can use one name for testout? I will wait for ....
you could, but would not advise using the same name for the files from the two different inputs. a reason why it's important to keep these minimally sized
I'm not sure of the ask here @YoulongXia-NOAA can wait until we can discuss
but these two input files do produce the same output actually, and not sure why that's wrong or a problem?
a
nccmp
of the testoutput files says they are the samenccmp imsfv3_scf.nc imsfv3grid_scf.nc -d -m -g -f -S -T 5.0e-4
@BenjaminRuston Even if they are identical, if the ctest are run in parallel they might be writing to the same file at the same time. If this is the case, then you would get trash in the output file.
@gthompsnJCSDA and @PatNichols this would be creating testRun
output files, but would both be comparing these to the same reference file in testoutput
first looking into the two ctests (that put output into testrun
) being compared against the same testoutput
file.
This would require a change to the iodaconv_comp.sh
and call to nccmp
think this is out of scope and prefer to leave this distinct for now
@YoulongXia-NOAA the other item was a potential change to the README.md
:
https://github.com/JCSDA-internal/ioda-converters/blob/develop/src/land/README.md
I'm not sure of the change you wanted to make there. @ClaraDraper-NOAA do you have further instruction. This is very close if not ready just want to finish this up please pass along what you think may still be needed.
Yes, that README does need to be updated. It covers only the old SNODAS-related converters. If you want to take a first go at it @BenjaminRuston I can fill in any missing details about the usage/relevance of the specific converters.
personally think it would be better to put the README changes in a separate issue.
the work is done for this particular issue ... and we should open a new issue to update the README from specific to SNODAS converters to what is being used?
whoops,, I didn't realize it let's me edit your comments @ClaraDraper-NOAA !!!!
anyhoo, please see above should we close this and open a new issue to update documentation and the README?
whoops,, I didn't realize it let's me edit your comments @ClaraDraper-NOAA !!!!
anyhoo, please see above should we close this and open a new issue to update documentation and the README?
I like the idea of closing this, then updating the README in a separate issue.
great,, I'm approving @ClaraDraper-NOAA we need one more approval and can move this forward
Description
This PR mainly to solve two problems: (1) update current imsfv3 converter snow depth units from m into mm, utc time from 18Z into 00z and error is set as 40. mm and (2) add imsfv3grid converter to fix the issue (https://github.com/JCSDA-internal/ioda-converters/issues/1497) to read time directly from NetCDF file.
Issue(s) addressed
Resolves #1497
Dependencies
List the other PRs that this PR is dependent on: NO
Impact
Expected impact on downstream repositories: NO
Checklist