E3SM-Project / processflow

A workflow tool for the E3SM project
MIT License
6 stars 5 forks source link

Add ilamb dignostic job #146

Closed jhkennedy closed 5 years ago

jhkennedy commented 5 years ago

This adds ILAMB as a diagnostic job.

Currently, this adds ILAMB analyses for the union of:

More analyses can be added by creating more CMOR handlers (see: E3SM-Project/e3sm_to_cmip#7)


TODO:

@sterlingbaldwin , it's up to you how important these are for getting an initial setup of ILAMB vs issues that should be opened and addressed later.

jhkennedy commented 5 years ago

@sterlingbaldwin So far, this seems to be working, except that #147 creates a further weird problem -- for some reason the ILAMB job won't copy the .../0106_0106_vs_obs/.../*_010501-010512.nc files into the ILAMB model directory (for either job!). And I have no idea why; when I add a debug print into https://github.com/E3SM-Project/processflow/blob/d7bf503bb75e6c4ba0032fea57dadb5a9dbd32c1/jobs/ilamb.py#L165-L172 it shows that the missing CMOR file_ should be copied into the right destination file, but nothing happens and no error is raised.

sterlingbaldwin commented 5 years ago

I'll look into it, its probably something going on with the filemanager not recognizing the output files.

sterlingbaldwin commented 5 years ago

OK I think I know what the problem is. I havent run any tests on the code yet, but Im pretty sure the issue is that ILAMB lists its _required_data = ["atm", "lnd"]. What it really should be is "cmorized," which is a derived data type created by cmor. When the cmor job runs, after it finishes it adds its output files to the file database with the type "cmorized." If the ILAMB job is changed have _data_required = ['cmorized'] then the job parent class should setup its required data automatically, and when execute is called the data should already be setup and ready in the jobs self._input_file_paths. The data should already be symlinked into a temp directory, and the self._input_file_paths is a list of all the required filepaths.

jhkennedy commented 5 years ago

I'm pretty sure the issue is that ILAMB lists its _required_data = ["atm", "lnd"]. What it really should be is "cmorized," which is a derived data type created by cmor. When the cmor job runs, after it finishes it adds its output files to the file database with the type "cmorized."

Ah good point, I missed this.

If the ILAMB job is changed have _data_required = ['cmorized'] then the job parent class should setup its required data automatically, and when execute is called the data should already be setup and ready in the jobs self._input_file_paths. The data should already be symlinked into a temp directory, and the self._input_file_paths is a list of all the required filepaths.

That'll definitely help. But that doesn't really address the issue raised in #147 or, because ILAMB doesn't like symlinks to files and the file need to be in a MODEL directory, the weird copying errors.

It might be easier to see what I mean in #147 by querying the output processflow.db:

sqlite> SELECT name,year FROM datafile WHERE datatype='cmorized' AND name LIKE '%.nc';
lai_Lmon_E3SM-1-0_piControl_r1i1p1f1_gr_010601-010612.nc|105  # filename year missmach!
tsl_Lmon_E3SM-1-0_piControl_r1i1p1f1_gr_010501-010512.nc|105
prc_Amon_E3SM-1-0_piControl_r1i1p1f1_gr_010601-010612.nc|105  # filename year missmach!
rlds_Amon_E3SM-1-0_piControl_r1i1p1f1_gr_010601-010612.nc|105  # filename year missmach! 
tas_Amon_E3SM-1-0_piControl_r1i1p1f1_gr_010601-010612.nc|105  # filename year missmach!
rsds_Amon_E3SM-1-0_piControl_r1i1p1f1_gr_010501-010512.nc|105
rlus_Amon_E3SM-1-0_piControl_r1i1p1f1_gr_010601-010612.nc|105  # filename year missmach!
pr_Amon_E3SM-1-0_piControl_r1i1p1f1_gr_010501-010512.nc|105
rsus_Amon_E3SM-1-0_piControl_r1i1p1f1_gr_010501-010512.nc|105
lai_Lmon_E3SM-1-0_piControl_r1i1p1f1_gr_010601-010612.nc|106
tsl_Lmon_E3SM-1-0_piControl_r1i1p1f1_gr_010501-010512.nc|106  # filename year missmach!
prc_Amon_E3SM-1-0_piControl_r1i1p1f1_gr_010601-010612.nc|106
rlds_Amon_E3SM-1-0_piControl_r1i1p1f1_gr_010601-010612.nc|106
tas_Amon_E3SM-1-0_piControl_r1i1p1f1_gr_010601-010612.nc|106
rsds_Amon_E3SM-1-0_piControl_r1i1p1f1_gr_010501-010512.nc|106  # filename year missmach!
rlus_Amon_E3SM-1-0_piControl_r1i1p1f1_gr_010601-010612.nc|106
pr_Amon_E3SM-1-0_piControl_r1i1p1f1_gr_010501-010512.nc|106  # filename year missmach!
rsus_Amon_E3SM-1-0_piControl_r1i1p1f1_gr_010501-010512.nc|106  # filename year missmach!

The files that are mismatched (wrong year in file name as compared to year value in table) were not copying into the output ILAMB model directory.

chengzhuzhang commented 5 years ago

@jhkennedy @sterlingbaldwin This pull request targets to use ilamb for cmorized data. To be able to apply ilamb to direct land model output, we also need to add remapping and cmorize steps. Scripts from e3sm_to_cmip could be helpful here. This is documentation for post-processing using some external tools. We can at least identify the required variables from these scripts.

sterlingbaldwin commented 5 years ago

Im going to merge this into "next" so that its easier to compare across branches.