ADF not picking the correct year for the climo

cecilehannay commented 1 year ago

ADF run type

Model vs. Model

What happened?

I am running the latest version of the ADF on a B1850 run. I am trying to select a particular set of years but I am not sure it is working correctly as I get a warning message when starting the ADF.

For instance, I am trying to select the years 10-25 that is common to all my runs:

   start_year: 10
   end_year: 25

But I am getting a warming saying those years don't exist:

['b.e23_alpha16b.BLT1850.ne30_t232.033']
Given start year '10' is not in current dataset b.e23_alpha16b.BLT1850.ne30_t232.035, using first found year: 0001 

Given end year '25' is not in current dataset b.e23_alpha16b.BLT1850.ne30_t232.035, using last found year: 0028 

Given start year '10' is not in current dataset b.e23_alpha16b.BLT1850.ne30_t232.033, using first found year: 0001 

Given end year '25' is not in current dataset b.e23_alpha16b.BLT1850.ne30_t232.033, using last found year: 0026

According to the message, the ADF is not using:

   start_year: 10
   end_year: 25

but instead, it seems to be using for the first run:

   start_year: 1
   end_year: 28

and for the second run:

   start_year: 1
   end_year: 26

I tried to set:

   start_year: 0010
   end_year: 0025

to see if it was helping but it doesn't

ADF Hash you are using

3e2ade6

What machine were you running the ADF on?

CISL machine

What python environment were you using?

NPL (CISL machines only)

Extra info

No response

brianpm commented 1 year ago

This bug must have something to do with the way the file names are parsed in adf_info.py.

Here's the relevant code:

            #Check if history file path exists:
            if baseline_hist_locs:

                starting_location = Path(baseline_hist_locs)
                files_list = sorted(starting_location.glob('*'+hist_str+'.*.nc'))
                base_climo_yrs = sorted(np.unique([i.stem[-7:-3] for i in files_list]))

                #Check if start or end year is missing.  If so then just assume it is the
                #start or end of the entire available model data.
                if syear_baseline is None:
                    print(f"No given start year for {data_name}, using first found year...")
                    syear_baseline = int(base_climo_yrs[0])
                elif str(syear_baseline) not in base_climo_yrs:
                    print(f"Given start year '{syear_baseline}' is not in current dataset {data_name}, using first found year:",base_climo_yrs[0],"\n")
                    syear_baseline = int(base_climo_yrs[0])
                #End if
                if eyear_baseline is None:
                    print(f"No given end year for {data_name}, using last found year...")
                    eyear_baseline = int(base_climo_yrs[-1])
                elif str(eyear_baseline) not in base_climo_yrs:
                    print(f"Given end year '{eyear_baseline}' is not in current dataset {data_name}, using last found year:",base_climo_yrs[-1],"\n")
                    eyear_baseline = int(base_climo_yrs[-1])
                #End if

I wonder if something is wrong with base_climo_yrs?

cecilehannay commented 1 year ago

Thanks, Brian.

One more thing that is misleading is that the website says the climos are computed with the years 10-25 Screen Shot 2023-08-15 at 1 52 47 PM

although the names of the directory seems to imply it is computed over other set of years (years 1-34 and 1-26)

b.e23_alpha16b.BLT1850.ne30_t232.034_1_34_vs_b.e23_alpha16b.BLT1850.ne30_t232.033_1_26/

cecilehannay commented 1 year ago

Bugfix in: https://github.com/NCAR/ADF/pull/254

NCAR / ADF