NCAR / DART

Data Assimilation Research Testbed
https://dart.ucar.edu/
Apache License 2.0
187 stars 140 forks source link

Develop Aether lon-lat interface #559

Closed kdraeder closed 5 months ago

kdraeder commented 10 months ago

Use case

Collaboration with Aaron Ridley (U Mich) requires development of an interface to his newly developed Aether upper atmosphere model. The first interface will be to the logically rectangular lon-lat grid version. Later we will develop an interface to the cubed-sphere version.

Describe your preferred solution

We plan to use the tiegcm model_mod as a template representing DART's best practices. We'll import pieces from gitm, which very closely match the Aether lon-lat restart file structure. Checkout the "aether" branch to contribute work or see its status. There are also rst files describing parts of the model and interface, written by @johnsonbk, which have not been committed yet.

Describe any alternatives you have considered

Developing a model_mod, etc., from scratch. That seems inefficient, given the existence of a gitm interface, which is closely related to Aether, and the tiegcm interface, which is a recently updated interface to another upper atmosphere model.

kdraeder commented 10 months ago

@kdraeder is merging the code which reads and writes restart files from gitm into aether. These can be included and tested using aether_to_dart.f90 and dart_to_aether.f90 without most of the rest of the model_mod being adapted to aether. (I renamed those programs to use 'dart' instead of gitm's 'netcdf', to be more consistent with other models.)

I believe @johnsonbk will work on translating the interpolation code from the existing Aether C++ and merging it into the model_mod.

kdraeder commented 10 months ago

I've merged GITM routines into the aether_lon-lat model_mod, but need to update the model_mod header to accomodate them. Some of it is grid information. While deciding what the names should be, I found that "restart" and "output files have different names for dimensions and variables, (and different variables in some cases).

Contents
  outputs:
               Bulk Ion Velocity ({Zonal,Meridional,Vertical})
  restarts have them for each species
    {Parallel,Perp} Ion Velocity ({Zonal,Meridional,Vertical}) ({O+,...})(x, y, z) ;
Names
-  Don't comply with the CF Metadata Conventions: 
          float Parallel\ Ion\ Velocity\ \(Zonal\)\ \(O+\)
           ->  "Parallel Ion Velocity (Zonal) (O+)"
-  Are different in "restart" and "output" files  (except for "z", I don't know why it's not "alt" in output)
     restart:  float Temperature\ \(O+\)(              x, y, z) ;
     output:  float O+\ Temperature     (block, lon, lat, z) ;

My leaning is to implement the restart dimension names in the aether_to_dart and dart_to_aether code trees and the output names in the rest of the model_mod (interpolation, get_close, ...). But I see arguments for adopting just one set of names. Thoughts?

kdraeder commented 10 months ago

@johnsonbk @hkershaw-brown The Aether restart fields are divided among 2 types of files: neutral and ions. They are further divided among the "blocks", which are subdomains of the globe. All of these need to be combined to make a single state vector, and there's a unique set of these files for each member. Program aether_to_dart will read all the restart files and repackage them into an ensemble state vector. Is this a situation that's well-suited for defining 2 domains; one for neutrals and one for ions? The template model_mod we're using already has multiple domains defined, so it seems that it would be straightforward to adapt them.

nancycollins commented 10 months ago

the aether_to_dart program is going to create N netcdf files (one per ensemble member) that filter will read. are there any fields that have the same name in the neutral and ion files? do they cover different parts of space? if not, it seems like it would be better to read in both the neutral and ion data blocks and create a single netcdf output file per ensemble member and have only a single domain. then filter only has to read one file per member instead of 2, the namelist input for reading fields can be simpler, and the interpolate code can be streamlined.

kdraeder commented 10 months ago

Thanks for the quick response! Clearly I was working too late; part of my brain knew that there will be an ensemble of filter input files.

There's no overlap of field names in the 2 files, so they can be combined in aether_to_dart. But then there needs to be a mechanism in dart_to_aether for reading the filter output (analysis) files and separating the fields into the neutrals and ions files. Domains look like a good candidate, but I'll go ahead with the combined strategy.

nancycollins commented 10 months ago

If the model itself used multiple netcdf files as input, then domains would help since each domain specification includes the input filename and filter would read them directly. But since these netcdf files are transient, you can structure them any way you want. Simpler seems better. You're right - the dart_to_aether program will need to know which netcdf variables go into which files, but that seems not too bad. You have to create the netcdf variable names in aether_to_dart since I'm assuming the block files don't have any metadata, so both parts of the packing and unpacking are under your control.

kdraeder commented 10 months ago

The model does expect multiple restart files: {ions,neutrals}x #blocks x #members.

The block files have units metadata, but no field names.

kdraeder commented 10 months ago

@johnsonbk @nancycollins Aether_to_dart needs the time of the restart files. The time in the sample data files is 0, but there's a time.json file with time 1458347400.0 That's apparently in seconds from some calendar start time. The sample output time.json has the same time, and a date in the filename of 20110320_000000. So time = 0 seems to be in Sep. 1964. Does anyone recognize that date? The closest thing I can think of is the start of the UNIX clock, but that's Jan 1 1970.

It looks like aether will need a custom read_model_time to get the time. Is there a "best" way to do that using fortran, other than reading it as a text file?

Does DART have code to translate a time from a system with an arbitrary "time=0" to the gregorian calendar? I don't see that capability in advance_time. The UNIX date command may be useful.

jlaucar commented 10 months ago

The time_manager has an array of operators for times and the ability to do different calendars. I think that you would be able to use time interval subtraction to do what you need once you know the base dates. That question should be addressed to Aaron if Ben doesn't know. Jeff

On Fri, Oct 20, 2023 at 8:38 AM kdraeder @.***> wrote:

@johnsonbk https://github.com/johnsonbk @nancycollins https://github.com/nancycollins Aether_to_dart needs the time of the restart files. The time in the sample data files is 0, but there's a time.json file with time 1458347400.0 That's apparently in seconds from some calendar start time. The sample output time.json has the same time, and a date in the filename of 20110320_000000. So time = 0 seems to be in Sep. 1964. Does anyone recognize that date? The closest thing I can think of is the start of the UNIX clock, but that's Jan 1 1970.

It looks like aether will need a custom read_model_time to get the time. Is there a "best" way to do that using fortran, other than reading it as a text file?

Does DART have code to translate a time from a system with an arbitrary "time=0" to the gregorian calendar? I don't see that capability in advance_time. The UNIX date command may be useful.

— Reply to this email directly, view it on GitHub https://github.com/NCAR/DART/issues/559#issuecomment-1772863027, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDHUIVRA2W3N662A3GGO63YAKEERAVCNFSM6AAAAAA56IF2V6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZSHA3DGMBSG4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

nancycollins commented 10 months ago

Re: multiple files

The model does expect multiple restart files: {ions,neutrals}x #blocks x #members.

Sure, that makes sense. But if the model directly read and wrote multiple netcdf format files, then domains are a way filter can read data from more than one file to build the state. Since you have to make a translator program to stitch the blocks together, it isn't necessary to use multiple domains.

You are going to read in the data once, and write it once. For domains, you would look at interpolate and get close and see if there is any advantage to having multiple domains. If not and if you have multiple domains, each call needs to look up which domain a variable is in before it can operate, and all calls need to pass around a domain number besides a variable name/number/offset. That argues against using them without a good cause.

Re: time If the file is text, i'd think just calling file_open() (a DART utility routine) and reading it as text and then using the time manager operators in the read_time routine would be good. I don't think you need to use any command line programs for it. Inside the DART time routines there are calls to set a time using Y/M/D/H/m/S for the base time, and then add and subtract with time structures to get the intervals.

kdraeder commented 10 months ago

The base date is the hangup. I've found some probably helpful code in Aether's include/constants.h const int cREFYEAR = 1965; const double cREFJULIAN = 2438762.0; const double cJULIAN2000 = 2451545.0; If the units are days, then the time=0 is ~6700 years ago. There's no REF for gregorian.

And in planets.cpp: nEarthDaysSince2000 = (time.get_julian_day() - cJULIAN2000) which implies that "julian_day" is not the day of the year, but is the number of days since the start of the calendar. Elsewhere in the code it refers to "true" julian day as the number of days since the start of the calendar, and plain julian day as the day of the year.

The difference between the 2 julian reference dates is 35.0 years, so cREFJULIAN represents cREFYEAR.

If assume a few things, I can line up this calendar with the gregorian, but it's probably better to ask Aaron. @johnsonbk can you do that soon, or should I do it?

kdraeder commented 10 months ago

In the short term we only need to know the meaning of the times found in the time.json files (and maybe in the output files. The restart files have time=0). Using the number of seconds in the time.json file and the date stamp on an output file yields $ echo 2011032000 -1458347400s | ./advance_time 196412312330 This is nominally the start of 1965, but maybe has some kind of time step adjustment to make Aether happy. I've defined an aether_ref_date with this value in the model_mod, so times read from time.json can be translated into gregorian (days,seconds).

kdraeder commented 10 months ago

@hkershaw-brown @nancycollins I'm at the stage where I need to decide how aether_to_dart will handle the ensemble. I don't know all of the possibilities, so please point me in the right direction. It seems that all members should be done in parallel, since they are completely independent. Is there a template for a {model}_to_dart that's an MPI program? Or a template script that runs multiple {model}_to_darts in parallel? GITM doesn't seem to do either of those, even though it has the same subdomains.

Then it seems that each member should be able to read all its restart files in parallel on different processors (hopefully on the same node) and write to its section of the filter_input.nc file.

hkershaw-brown commented 10 months ago

@hkershaw-brown @nancycollins I'm at the stage where I need to decide how aether_to_dart will handle the ensemble. I don't know all of the possibilities, so please point me in the right direction. It seems that all members should be done in parallel, since they are completely independent. Is there a template for a {model}_to_dart that's an MPI program? Or a template script that runs multiple {model}_to_darts in parallel? GITM doesn't seem to do either of those, even though it has the same subdomains.

Do the aether files have multiple ensemble members in them? If not, aether_to_dart does not need to know about the ensemble. read in aether files for a single ensemble member X, write out filter_inputX.nc

Do you have this already? Is so run ens_size of this program to go parallel. Or just do it in loop for now.

Then it seems that each member should be able to read all its restart files in parallel on different processors (hopefully on the same node) and write to its section of the filter_input.nc file.

Not quite sure what you are getting at here, but filter_nml 'filter_input{output}_list.txt' is a list of filter_input{output}X.nc (X = 1, ens_size). filter takes care of reading and writing these.

kdraeder commented 10 months ago

There are separate restarts for each member, so running ens_size of aether_to_dart seems to be the best. I seem to remember that it's challenging to put each of those on a separate node.

Yes, filter handles the ensemble of filter_inputX files just fine. I'm still at the stage of creating those files. Aether has 2 restart files (ions and neutrals) for each subdomain of the globe. So there are 2 x #subdomains files to be read for each member's filter_input. #subdomains ~ O(100). There will be lots of processors available, so I'm hoping that all those files can be read in parallel. It seems best if they can all be on the same node to minimize communication. And having each member on a different node would speed things up.

hkershaw-brown commented 10 months ago

I'd recommend writing the serial version first.

nancycollins commented 10 months ago

i think at this point i'd get the converter programs running and worry about optimizing i/o at a later time once you can run them and measure the performance. if it's a problem, things can be done with scripting changes once you have the running programs. it seems like getting the blocks into a netcdf file and getting variables from the netcdf file back into blocks is a good first goal.

nancycollins commented 10 months ago

I'd recommend writing the serial version first.

i don't think we've got any MPI-enabled converters in the system. you write a serial one, you run it serially N times to test and time it. if it takes a significant fraction of the run time, you change the batch scripting to run N copies of it, one for each ensemble member, in parallel. cheap 80-way parallelism without having to write an MPI program! :)

kdraeder commented 10 months ago

Perfect! I didn't want to head off in the wrong direction.

kdraeder commented 10 months ago

@nancycollins @hkershaw-brown I'm trying to read a float field from a netcdf file using nc_get_variable. The module procedure that is being chosen is nc_get_double_3d. Then the call fails: get values for Longitude: NetCDF: Start+count exceeds dimension bound.

Why is the double version chosen? Is there a way to specify the nc_get_real_3d?

If you think that the error message is not related to the procedure choice, I'll send the call and the argument definitions that I've tried.

Thanks.

hkershaw-brown commented 10 months ago

temp is r8 that is why nc_get_double_3d is being called https://github.com/NCAR/DART/blob/19a5b0a17b2645da794faf4b09d11790276c1731/models/aether_lon-lat/model_mod.f90#L687

https://github.com/NCAR/DART/blob/19a5b0a17b2645da794faf4b09d11790276c1731/models/aether_lon-lat/model_mod.f90#L727-L728

The Start+count error means this size of what you are reading does not match the variable you've given the read call

kdraeder commented 10 months ago

Thanks for digging into it! My fortran is rather rusty.

I changed temp to r4 and it still failed, so I explicitly set the index ranges, starts, etc.:

allocate(temp(1-nGhost:nxPerBlock+nGhost, &
              1-nGhost:nyPerBlock+nGhost, &
              1:nzPerBlock))

starts(1) = 1-nGhost
starts(2) = 1-nGhost
starts(3) = 1
ends(1)   = nxPerBlock+nGhost
ends(2)   = nyPerBlock+nGhost
ends(3)   = nzPerBlock
xcount = nxPerBlock + 2*nGhost
ycount = nyPerBlock + 2*nGhost
zcount = nzPerBlock
    call nc_get_variable(ncid, 'Longitude', &
         temp(starts(1):ends(1),starts(2):ends(2),starts(3):ends(3)), &
         routine, &
         nc_start=(/starts(1),starts(2),starts(3)/), &
         nc_count=(/xcount,ycount,zcount/))

but it's still failing in the same way.

kdraeder commented 10 months ago

I think that I found the problem. The Longitude array has 44 altitudes, but I dimensioned temp to be 40, the number the model uses with no vertical subdivisions, which is used elsewhere. So don't spend any more time on this bit.

kdraeder commented 10 months ago

I was wrong. There's something more subtle going wrong (or so obvious I can't see it). I've simplified the situation to the following. Here's the dump from the file I'm reading

dimensions:
        x = 22 ;
        y = 22 ;
        z = 44 ;
variables:
   float Longitude(x, y, z) ;

Here are the relevant lines from the reading routine.

real(r4) :: simple (22,22,44)
ncid = open_block_file(dirname, 'grid', -1, nb-1, 'read', filename)
call nc_get_variable(ncid, 'Longitude', simple, routine)

open_block_file seems to work: open_block_file Opening file /Users/raeder/DAI/Manhattan/models/aether_lon-lat/testdata1/restartOut.Sphere.1member/grid_g0000.nc for read But the nc_get doesn't:

ERROR FROM:
  source : netcdf_utilities_mod.f90
  routine: nc_get_real_3d
  message: get values for Longitude: NetCDF: Start+count exceeds dimension bound
  message: ... get_grid_from_blocks
nancycollins commented 10 months ago

if reversing the order of the dimensions works, then i only know this from previous horrible experiences with it.

kdraeder commented 10 months ago

@nancycollins @mgharamti Yes, the order of the dimensions was the problem. And ncdump -h -f [f,c] $file.nc lists the same order of dimensions for f and c. Thanks for the tip!

kdraeder commented 10 months ago

Using the number of seconds in the time.json file and the date stamp on an output file yields $ echo 2011032000 -1458347400s | ./advance_time 196412312330 This is nominally the start of 1965, but maybe has some kind of time step adjustment to make Aether happy.

On second thought, it's more likely that the date stamp in the file name is off by 30 minutes. Something like that happens in CESM history files which are named according to the time of the first data (or something else that gives them a misleading name). Aaron says that the reference time is intended to be 1965-01-01 0 UTC, so I'll change to that and the model may need to make a correction to the file names.

kdraeder commented 10 months ago

@nancycollins @mgharamti Yes, the order of the dimensions was the problem. And ncdump -h -f [f,c] $file.nc lists the same order of dimensions for f and c. Thanks for the tip!

Since I'm reading in the data in (z,y,x) order, would it break any rules or efficiencies to write it out that way to filter_input.nc (with the proper dimensioning)? That would avoid having to do the transpose to (x,y,z) (and back again, in dart_to_aether).

kdraeder commented 10 months ago

@nancycollins confirmed that I should not transpose the data; just read the block NetCDF file with its C dimension order and write the data to filter_input.nc in the same order.

We also found that ncdump on Macs is broken; -f f and -f c show the same dimension order. -F is unrelated to fortran and C. Variables listed by ncdump as (x,y,z) must be read into a fortran array dimensioned (z,y,x).

kdraeder commented 7 months ago

@hkershaw-brown @nancycollins

I'm confused by the selection of QTYs that can be associated with the Aether variables. I made some choices in the early rush to get something working, but now I'd like to figure out if they were good choices. I don't even know how to decide whether it matters (much).

My first problem is interpretting what the available QTY's represent. I haven't found a key to decipher the parts of, e.g. QTY_DENSITY_ION_O2DP . O2 could mean 'oxygen molecule', D could mean an extra or missing electron from the D orbital, P could be similar, or mean 'positive'. Or O could mean 'oxygen atom' with 2D or 2DP meaning something. Comments in space_quantities_mod.f90 or in a docs.dart page would be helpful.

The Aether variable 'velocity_parallel_up\ (O+)' could potentially have these existing QTYs associated with it: QTY_VELOCITY_W QTY_VELOCITY_W_ION QTY_VERTICAL_VELOCITY (There might be more; these were harvested from the list that came out of preprocessor for this case) or maybe it should have a new QTY like the existing QTY_VELOCITY_VERTICAL_O2: QTY_VELOCITY_PARALLEL_VERTICAL_OP This last seems safest, since each ion has its own 2 vertical velocities. But I don't know how they'll be used, so maybe a simple, generic QTY for all the ions is fine.

@johnsonbk Maybe the underlying question is "what obs should we prepare for people to use?" If there are obs of individual ion species, then we need different QTY(s) for each. If not, is it OK to use the same QTY for multiple ion species? Will the forward operator use contributions from all of those species?

I've attached the file issues_QTY.txt which gives more details. I'll remove it from models/aether_lon-lat.

kdraeder commented 6 months ago

The existing aether_to_dart derives the names of the Aether restart files from the member number read from standard input, either passed to it from a script or entered by the user. One suggestion from the code review is to replace that. (At the moment I've just added a prompt, so that user's will know what to do.)

One alternative is, one member at a time, for the script (or user) to link the restart files to names that don't have the member number in them, so that aether_to_dart would always read from the same file names. There are 2 x #blocks files in each member's restart set (e.g. 400 for a 10x20 block domain), which isn't too hard to code into a script, but is a pain for a user to link when they need a filter_input.nc file outside of the assimilation work flow. Is there a better alternative?

hkershaw-brown commented 6 months ago

Let's focus on getting the aether_to_dart dart_to_aether subroutines out of the model_mod. Once that is done we can sort out what the scripting.

kdraeder commented 6 months ago

Here's the issues_QTYs.txt file mentioned in the comment ~2024-1-25. The contents have some of that comment, and additional context.

issue_QTYs.txt

kdraeder commented 6 months ago

@hkershaw-brown This might be ready for a PR. @johnsonbk and @kdraeder have made the changes recommended in code review 2 and they're pushed to NCAR/DART:aether. Is there a list of things to do before issuing a PR?

hkershaw-brown commented 6 months ago

@kdraeder you can go ahead and make the PR if you're ready.

Best practice would be to update aether with the main branch, before you make the PR. But we can just do that in the PR.