E3SM-Project / e3sm-unified

A metapackage for a unified anaconda environment for analyzing results from the Energy Exascale Earth System Model (E3SM).
BSD 3-Clause "New" or "Revised" License
8 stars 8 forks source link

latest ncremap failed with vertical interpolation #79

Closed wlin7 closed 4 years ago

wlin7 commented 4 years ago

This happened on cori with latest e3sm_unified. The vertical interpolation was for an EAM initial file. The error message is /global/cfs/cdirs/e3sm/software/anaconda_envs/base/envs/e3sm_unified_1.3.1/bin/ncremap: line 2767: 60854 Segmentation fault

The command to run and reproduce the error

cd /global/cscratch1/sd/wlin/share/testVrtInt
 ./map_ne30np4_to_ne120np4_vert_L72.sh.  # contains the actual ncremap cmd and options

The script used to work well. Instead of using laetest e3sm_unified, if activating e3sm_unified_1.3.0, it would still work.

This is not urgent since I can stick with 1.3.0 for now. Thanks,

xylar commented 4 years ago

@wlin7, I'll keep this open for now but this strikes me as something pretty unlikely to be an e3sm-unified problem directly even if you get different behavior in different versions. It presumably should be reported to nco. But I'll let @czender decide how he would like this handled.

whannah1 commented 4 years ago

@wlin7, I recently used the vertical interpolation without issue. I was using NCO version 4.9.2. Looking at the unified env, I see that: e3sm unified 1.3.0 => NCO v. 4.8.1 e3sm unified 1.3.1 => NCO v. 4.9.3 Back in Feb I stumbled onto an NCO bug with the vertical interpolation which should be fixed in 4.9.2. This is probably a different issue, but it might be worth trying that same command with 4.9.2 to see if it helps.

wlin7 commented 4 years ago

Thanks for tracking this issue, @xylar . I did wonder where to report. The problem is more like just an NCO issue. If any fixes are applied to certain nco files without changing path, I don't see any changes needed on the e3sm_unified side.

wlin7 commented 4 years ago

@whannah1 , NCO v. 4.8.1 is used in e3sm_unified_1.3.0. The current issue would remain when using e3sm_unified 1.3.1. Nice to know you were able to do vertical interpolation without issue using NCO 4.9.2. If you still have the files, can you do a test using the latest e3sm_unified? Just to rule out if the specific file I used plays any role.

BTW, the script mentioned in the description of the issue and the target output files have permission for all. Please feel free to modify the script and run the test.

xylar commented 4 years ago

If any fixes are applied to certain nco files without changing path, I don't see any changes needed on the e3sm_unified side.

Unfortunately, that isn't possible with e3sm-unified. If there is a new nco release, we would need to have a new release of e3sm-unified to bring it in. We can't modify an existing e3sm-unified environment in place.

xylar commented 4 years ago

@whannah1, thanks for the context. @wlin7, I tried your script with the following NCO verisons, where a check mark indicates that it worked and no check mark means it failed:

xylar commented 4 years ago

So it seems possible that the same fix that @whannah1 mentioned in 4.9.2 has inadvertently led to your seg fault in this case. I'll leavie it up to @czender from here because I think only he will know the ins and outs of ncremap well enough to handle this.

wlin7 commented 4 years ago

Thanks for isolating the source of the issue, @xylar. With your tests, it does seem to suggest a bug was inadvertently introduced since 4.9.2, which also involves vertical interpolation.

czender commented 4 years ago

I will take a look at this later today. Followup at nco/nco#29.