E3SM-Project / e3sm-unified

A metapackage for a unified anaconda environment for analyzing results from the Energy Exascale Earth System Model (E3SM).
BSD 3-Clause "New" or "Revised" License
8 stars 8 forks source link

cdscan fails on grib control files #75

Closed tangq closed 4 years ago

tangq commented 4 years ago

cdscan has been used to merge multiple grib control files into a single file. However, it fails with the newer versions in the e3sm_unified. The follow errors appears with e3sm_unified_1.2.6 and 1.3.0.

This is likely an issue with newer cdscan version. An older version from uvcdat 2.4.1 on the other machine works fine.

The files to reproduce this issue is at cori:/global/cscratch1/sd/tang30/ECMWF_Interim/download/2014

[tang30@cori06 2014]$ cdscan -x tst.xml *.ctl Finding common directory ... Common directory: Scanning files ... UVTQPS-20140101_T255.grib.ctl Setting reference time units to Traceback (most recent call last): File "/global/cfs/cdirs/e3sm/software/anaconda_envs/base/envs/e3sm_unified_1.2.6/bin/cdscan", line 1840, in main(sys.argv) File "/global/cfs/cdirs/e3sm/software/anaconda_envs/base/envs/e3sm_unified_1.2.6/bin/cdscan", line 1282, in main timeIsLinear = (referenceTime[0].lower().split() in IndexError: string index out of range

chengzhuzhang commented 4 years ago

@tangq Hi. I think it would be helpful to isolate if this is a cdscan or e3sm_unified problem. I will try test with standalone cdat environment to test if cdscan applies to the particular data.

chengzhuzhang commented 4 years ago

I tested cdscan within different cdms2 versions (3.0.0 and 3.1.4) was able to reproduce @tangq 's error in v3.1.4. Cdscan in v3.0.0 works. I'm directing this issue to CDAT development team.

tangq commented 4 years ago

Thanks @chengzhuzhang for the help. It would be nice if the cdscan fix can be added to the newly released e3sm_unified.

Since the older cdscan works, I assume the fix should be relatively minor and quick.

tangq commented 4 years ago

@chengzhuzhang , did you run the old cdscan on cori? If so, is that possible for me to use the older version? I need to run it for multiple years of data.

chengzhuzhang commented 4 years ago

I tried in one of my old environment for e3sm_diags. I can create the .xmls for you. Just let me know how and where I should create those files.

xylar commented 4 years ago

@tangq, this is the first I'm hearing about cdscan. If it needs to be supported in E3SM-Unified, I would need to know about it and help to make sure it works before deploying the environment. It isn't really possible to just add a fix after the fact. It takes me weeks or months to make sure all the packages work together and upgrading one can break that delicate balance, so I really need all the packages we support to be known ahead of time and carefully integrated to make sure there are no conflicts. Is cdscan an conda package? If so, where is it maintained?

xylar commented 4 years ago

I see, after some investigation, cdscan is a package of cdms2, so, yeah, this is an issue for CDAT folks.

Even if there's a bug fix, I'm hesitant to include it in this version of E3SM-Unified unless we can also be sure that E3SM_Diags works fine with the new version. And we'd need a new release really soon to not hold up the roll-out. Talking with @muryanto1, it sounded like they're testing a release candidate and a bug fix might be able to go in, but that we couldn't expect a new cdms2 release for at least a little while.

tangq commented 4 years ago

@xylar , thanks for the clarifications. I will use some older cdms2 as a temporary solution.

tangq commented 4 years ago

I tried in one of my old environment for e3sm_diags. I can create the .xmls for you. Just let me know how and where I should create those files.

Thanks, @chengzhuzhang . Some of the grib files are still downloading. I will let you know when it's complete.

xylar commented 4 years ago

This seems to be getting addressed here: https://github.com/CDAT/cdms/pull/399

Once it is handled, I think I can patch the existing version of cdms2 to include this fix as well. If that happens very soon, I might be able to include it in E3SM-Unified 1.3.1 after all.

xylar commented 4 years ago

@tangq, this has been fixed in the latest cdms2 and I have a PR in to patch the current release as well: https://github.com/conda-forge/cdms2-feedstock/pull/57

Once that is in, I will rebuild E3SM-Unified on compy and anvil, and ask you to try again. Hopefully, by Monday.

tangq commented 4 years ago

@xylar , the issue reported there is for netcdf files. I am not sure if the fix can also solve the issue here for the grib control files. I will need to test it once you rebuild the ESM-Unified. Thanks.

xylar commented 4 years ago

@tangq, E3SM-Unified really isn't a good place for testing this. I'll do it this time but we need to figure out a better workflow for noticing these issues earlier and testing with the new packages as they are released. That's way outside of my job description.

xylar commented 4 years ago

@rljacob and @chengzhuzhang, can we add a discussion of how to better stress test E3SM-Unified to an IG agenda?

tangq commented 4 years ago

@xylar , I agree that this is not an E3SM-Unified issue. It should be addressed by individual packages before it is integrated into E3SM-Unified.

tangq commented 4 years ago

On the E3SM-Unified confluence page, I noticed that packages are assigned to different people. Perhaps, they know more about these packages and thus have better testing ideas?

xylar commented 4 years ago

this is not an E3SM-Unified issue

Right, but if E3SM-Unified isn't very useful for the next 6 months because it includes this issue, that's too bad. So if it's easy to address, that would be great!

packages are assigned to different people

Yes, that's true. Especially for the "main" packages, we do expect that these people have tested them before they go into E3SM-Unified. But at least this time, a lot of issues seem to have slipped through the cracks unnoticed. This suggests we may need a trial period for E3SM-Unified that's longer than a couple of days to catch these issues. This didn't seem to be needed previously so I hadn't thought about it too much. But I also thought that update every 2-3 months would be better than every 6-9. Project management disagrees.

chengzhuzhang commented 4 years ago

@rljacob and @chengzhuzhang, can we add a discussion of how to better stress test E3SM-Unified to an IG agenda?

Yes. It will be a very useful discussion. Also we may want to roll out a reasonable schedule for deploying E3SM-Unified, which should balance the project need and development cycle of individual packages that are in active development. A fixed schedule with testing period might be helpful for each tools to be better prepared before adding to a E3SM-Unified release.

xylar commented 4 years ago

@tangq, I have updated e3sm-unified on Anvil and Compy to include the patches in https://github.com/CDAT/cdms/pull/399 and https://github.com/conda-forge/cdms2-feedstock/pull/57. Could you please re-test cdscan to see if you still see problems? If so, could you make an issue on https://github.com/CDAT/cdms/issues?

tangq commented 4 years ago

@xylar, I tested it on compy at /compyfs/tang338/tst. It got the same errors.

I am not sure the env loaded by source /compyfs/software/e3sm-unified/load_latest_e3sm_unified.sh contains this fix, because it still loads e3sm_unified_1.3.0 and I don't see version 1.3.1 in the base/envs.

If there is another way to load the E3SM-Unified with the patches, please let me know and I can test again.

xylar commented 4 years ago

@tangq oh, no! I'm sorry about that! Thing have moved to:

/share/apps/E3SM/conda_envs/

But I didn't update the path on Confluence. I'm doing my best to fix that now.

tangq commented 4 years ago

Using 1.3.1 I got the same errors:

(e3sm_unified_1.3.1) [tang338@compy-e tst]$ cdscan -x tst.xml *.ctl
Finding common directory ...
Common directory: 
Scanning files ...
UVTQPS-20140101_T255.grib.ctl
Setting reference time units to 
Traceback (most recent call last):
  File "/share/apps/E3SM/conda_envs/base/envs/e3sm_unified_1.3.1/bin/cdscan", line 1842, in <module>
    main(sys.argv)
  File "/share/apps/E3SM/conda_envs/base/envs/e3sm_unified_1.3.1/bin/cdscan", line 1284, in main
    timeIsLinear = (referenceTime[0].lower().split() in
IndexError: string index out of range

It looks like that the "list" command is not added on these lines...This is why I am not a fan of the non-backward compatible python updates.

xylar commented 4 years ago

@tangq, That's too bad! As I said, this is an issue for CDAT folks. But it looks like a fix is unlikely to make it into this E3SM-Unified.

This is why I am not a fan of the non-backward compatible python updates.

I don't understand why this is relevant. The update was incomplete but I don't think this is related to backwards compatibility...

tangq commented 4 years ago

The cdscan issue is caused by that py3 is not backward compatible with py2. The fix CDAT put in is adding list() around some statements. I should have made it clearer that the update I meant is the py3 updates, not the E3SM-Unified updates.

xylar commented 4 years ago

Ah, I see. Yes, python major versions are not meant to be backwards compatible. That may seem inconvenient but it comes with a lot more freedom to improve the performance and API over time. And there were 8 years of warning that python 2 would go away...

xylar commented 4 years ago

I'm going to close this again. I am almost done deploying E3SM-Unified 1.3.1. All other issues that we identified have been addressed and it doesn't seem like there will be a quick enough fix for this one to make the cut.

@tangq, please do open an issue with the CDAT folks if you haven't already. It would be nice to have this fixed and ready for the next E3SM-Unified release in 6 months or so.

tangq commented 4 years ago

@xylar , sounds good. @chengzhuzhang opened an issue for this already.