dtcenter / MET

Model Evaluation Tools
https://dtcenter.org/community-code/model-evaluation-tools-met
Apache License 2.0
78 stars 24 forks source link

MET 11.1.0 unable to install on MacOS using Homebrew #2775

Open HathewayWill opened 10 months ago

HathewayWill commented 10 months ago

Replace italics below with details for this issue.

Describe the Problem

MET 11.1.0 fails to build NETCDF-CXX

Expected Behavior

MET would compile like 11.0.0

Environment

Describe your runtime environment: 1. Machine: Virtual Machine 2. OS: MacOS 13 *3. Software version number(s): 13.4 beta

To Reproduce

See attached zip file with logs and compile.sh script

MET_FAIL_MACOS.zip

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

Define the source of funding and account keys here or state NONE.

Define the Metadata

Assignee

Labels

Milestone and Projects

Define Related Issue(s)

Consider the impact to the other METplus components.

Bugfix Checklist

See the METplus Workflow for details.

HathewayWill commented 10 months ago

ld: warning: directory not found for option '-L/Users/workhorse/WRF/MET-11.1.0/external_libs/lib/lib' ld: warning: directory not found for option '-L=/Users/workhorse/WRF/MET-11.1.0/external_libs/lib:/Users/workhorse/WRF/MET-11.1.0/external_libs/lib'

line 134/135 of the netcdf-cxx install log

jprestop commented 10 months ago

Hi @HathewayWill. Could you please try changing the following line in compile_MET_all.sh from:

configure_lib_args="-lhdf5_hl -lhdf5 -lz"

to

configure_lib_args="-lnetcdf -lhdf5_hl -lhdf5 -lz"

and see if you get a successful compilation? Please let us know how it goes. Thanks!

HathewayWill commented 10 months ago

Hi @HathewayWill. Could you please try changing the following line in compile_MET_all.sh from:

configure_lib_args="-lhdf5_hl -lhdf5 -lz"

to

configure_lib_args="-lnetcdf -lhdf5_hl -lhdf5 -lz"

and see if you get a successful compilation? Please let us know how it goes. Thanks!

sadly that didn't work, here are the log files again.

HathewayWill commented 10 months ago

MET_FAIL_MACOS 2.zip

HathewayWill commented 10 months ago

@jprestop

Found a solution to netcdfcxx but I don't know why.

Needed configure_lib_args="-lnetcdf -lm -lhdf5_hl -lhdf5 -lz"

HathewayWill commented 10 months ago

but now we have a different error:

met.configure.log met.make.log met.make_install.log met.make_test.log

very confused @jprestop

jprestop commented 10 months ago

Hi @HathewayWill.

I see in the met.make_test.log file:

*** Running Wavelet-Stat on APCP using a GRIB forecast and netCDF observation ***
../src/tools/core/wavelet_stat/wavelet_stat \
        ../data/sample_fcst/2005080700/wrfprs_ruc13_12.tm00_G212 \
        ../out/pcp_combine/sample_obs_2005080712V_12A.nc \
        config/WaveletStatConfig_APCP_12 \
        -outdir ../out/wavelet_stat -v 2
DEBUG 1: Start grid_stat by workhorse(501) at 2024-01-15 18:15:58Z  cmd: ../src/tools/core/grid_stat/grid_stat ../out/pcp_combine/sample_fcs\
t_12L_2005080712V_12A.nc ../out/pcp_combine/sample_obs_2005080712V_12A.nc config/GridStatConfig_APCP_12 -outdir ../out/grid_stat -v 2
DEBUG 2: OMP_NUM_THREADS is not set. Defaulting to 1 thread. Recommend setting OMP_NUM_THREADS for faster runtimes.
DEBUG 2: OpenMP running on 1 thread(s).
DEBUG 1: Default Config File: /Users/workhorse/WRF/MET-11.1.0/share/met/config/GridStatConfig_default
DEBUG 1: User Config File: config/GridStatConfig_APCP_12
GSL_RNG_TYPE=mt19937
GSL_RNG_SEED=1
DEBUG 1: Forecast File: ../out/pcp_combine/sample_fcst_12L_2005080712V_12A.nc
DEBUG 1: Observation File: ../out/pcp_combine/sample_obs_2005080712V_12A.nc
DEBUG 2: Processing masking regions.
terminate called after throwing an instance of 'netCDF::exceptions::NcNotNCF'
  what():  NetCDF: Unknown file format
file: ncFile.cpp  line:88
FATAL: Received Signal Abort. Exiting 6
make[1]: *** [grid_stat] Error 6
make[1]: *** Waiting for unfinished jobs....

Let's check on your NetCDF installations. Can you please tell me if all of the following files exist in your /Users/workhorse/WRF/MET-11.1.0/external_libs/include and /Users/workhorse/WRF/MET-11.1.0/external_libs/lib directories?

Files for NetCDF4 C: $MET_NETCDF/include/netcdf.h $MET_NETCDF/lib/libnetcdf.a $MET_NETCDF/lib/libnetcdf.so

Files for NetCDF4 C++: $MET_NETCDF/include/netcdf $MET_NETCDF/lib/libnetcdf_c++4.a $MET_NETCDF/lib/libnetcdf_c++4.so

HathewayWill commented 10 months ago

@jprestop

They appear to be in there.
Screenshot 2024-01-16 at 5 07 10 PM Screenshot 2024-01-16 at 5 09 22 PM

for netcdf-c++ I had to add -lm and -lnetcdf

jprestop commented 10 months ago

@HathewayWill Ah yes, very confusing indeed. I think @georgemccabe figured out the problem. The compile_MET_all.sh script was running "make test" using MAKE_ARGS. Since some tests rely on the output of other tests to succeed, running "make test" in parallel won't work and explains the confusing information in the log file where it says it is running wavelet_stat, but then the log information refers to grid_stat. I have modified the compile_MET_all.sh script and have added "-lnetcdf -lm" to configure_lib_args for the compilation of NetCDF-CXX. Please download the new script and try again.

HathewayWill commented 10 months ago

@jprestop @georgemccabe

So does make test need to have the make args removed for each one if running in parallel processing?

I'm running a WRF run right now but give me tonight and I'll test it later.

Sounds like that was the error which will make everyone happy that it's fixed using dtc-mosit and WRF-mosit

jprestop commented 10 months ago

HI @HathewayWill

So does make test need to have the make args removed for each one if running in parallel processing?

I don't think I understand your question. I'm not sure what you mean by "each one".

To help clarify, we changed:

run_cmd "make ${MAKE_ARGS} test > $(pwd)/met.make_test.log 2>&1"

to

run_cmd "make test > $(pwd)/met.make_test.log 2>&1"

Maybe you mean - does ${MAKE_ARGS} need to be removed in calls to the external libraries' "make test" commands? If so, the answer, unfortunately, is I don't know if the external libraries "make test" commands rely on the output of other tests to succeed. All I can say is that I haven't experienced this problem previously in installations on various machines, so I think until we encounter a problem it is likely ok to leave as-is.

HathewayWill commented 10 months ago

@jprestop

That was what I was getting at.

You answered my question about the removal.

Sorry for the confusion

HathewayWill commented 10 months ago

@jprestop

Testing it now.

Was reading the new compile_MET script and I noticed that the make install for met doesn't have MAKE ARGS. Can met not be installed in parallel?

run_cmd "make install > met.make_install.log 2>&1"

HathewayWill commented 10 months ago

@jprestop

tested it and it got worse.

Before it would fail at test now it fails at met.make

Here is the relevant log files. met.make.log configure.log

compile_MET_all.log

HathewayWill commented 10 months ago

@jprestop @georgemccabe untitled folder.zip

different error now.

jprestop commented 10 months ago

I'm wondering if these files are corrupted:

DEBUG 1: Forecast File: ../out/pcp_combine/sample_fcst_12L_2005080712V_12A.nc DEBUG 1: Observation File: ../out/pcp_combine/sample_obs_2005080712V_12A.nc

Could you please send them to us following the directions here?

HathewayWill commented 10 months ago

I'm wondering if these files are corrupted:

DEBUG 1: Forecast File: ../out/pcp_combine/sample_fcst_12L_2005080712V_12A.nc DEBUG 1: Observation File: ../out/pcp_combine/sample_obs_2005080712V_12A.nc

Could you please send them to us following the directions here?

@jprestop I'm having issues with ubuntu and the ftp protocol.

jprestop commented 10 months ago

You could also try to attached the files here, @HathewayWill.

HathewayWill commented 10 months ago

You could also try to attached the files here, @HathewayWill. @jprestop @georgemccabe

https://we.tl/t-tobDvvKulo

not sure if you can get this otherwise email me directly

jprestop commented 9 months ago

Hi @HathewayWill. Well, the NetCDF files do not seem to be corrupted. I copied them over and ran "ncdump" on them, and that worked fine. I also copied them to our project machine and ran the command that is causing you problems:

/nrit/ral/met-11.1.0/bin/grid_stat sample_fcst_12L_2005080712V_12A.nc sample_obs_2005080712V_12A.nc GridStatConfig_APCP_12 -outdir ./out/grid_stat -v 2

but I got a successful run. I did not have the problem you are experiencing:

*** Running Grid-Stat on APCP using netCDF input for both forecast and observation ***
../src/tools/core/grid_stat/grid_stat \
        ../out/pcp_combine/sample_fcst_12L_2005080712V_12A.nc \
        ../out/pcp_combine/sample_obs_2005080712V_12A.nc \
        config/GridStatConfig_APCP_12 \
        -outdir ../out/grid_stat -v 2
DEBUG 1: Start grid_stat by workhorse(501) at 2024-01-19 01:39:36Z  cmd: ../src/tools/core/grid_stat/grid_stat ../out/pcp_combine/sample_fcst_12L_2005080\
712V_12A.nc ../out/pcp_combine/sample_obs_2005080712V_12A.nc config/GridStatConfig_APCP_12 -outdir ../out/grid_stat -v 2
DEBUG 2: OMP_NUM_THREADS is not set. Defaulting to 1 thread. Recommend setting OMP_NUM_THREADS for faster runtimes.
DEBUG 2: OpenMP running on 1 thread(s).
DEBUG 1: Default Config File: /Users/workhorse/WRF/MET-11.1.0/share/met/config/GridStatConfig_default
DEBUG 1: User Config File: config/GridStatConfig_APCP_12
GSL_RNG_TYPE=mt19937
GSL_RNG_SEED=1
DEBUG 1: Forecast File: ../out/pcp_combine/sample_fcst_12L_2005080712V_12A.nc
DEBUG 1: Observation File: ../out/pcp_combine/sample_obs_2005080712V_12A.nc
DEBUG 2: Processing masking regions.
terminate called after throwing an instance of 'netCDF::exceptions::NcNotNCF'
  what():  NetCDF: Unknown file format
file: ncFile.cpp  line:88
FATAL: Received Signal Abort. Exiting 6
make[1]: *** [grid_stat] Error 6
make: *** [test] Error 2

Let's have you try running the same command outside of "make test". In the directory /Users/workhorse/WRF/MET-11.1.0/MET-11.1.0/scripts, could you please run the following:

export TEST_OUT_DIR=/Users/workhorse/WRF/MET-11.1.0/MET-11.1.0
/Users/workhorse/WRF/MET-11.1.0/bin/grid_stat \
../out/pcp_combine/sample_fcst_12L_2005080712V_12A.nc \
../out/pcp_combine/sample_obs_2005080712V_12A.nc \
config/GridStatConfig_APCP_12 \
-outdir ../out/grid_stat -v 2

Please give that a try and post the output here. Please let me know if you have any questions.

HathewayWill commented 9 months ago

@jprestop

I'm going to rebuild the mac and test it again, i can't even repeat the error on my own machine, now it is stopping before the previous error.

Do you have a mac machine available there?

jprestop commented 9 months ago

Hi @HathewayWill. We have two developers who have successfully installed MET-11.1 on their Macs. One was using 13.6.2 (Ventura) and the other was using 12.6.2 (Monterey). Both compiled using the GNU compilers.

HathewayWill commented 9 months ago

Morning @jprestop

Okay that's good to know. Let me rebuild my mac and double check on my side. I wonder if it's the shell script.

Are they using homebrew GNU compilers or something else?

jprestop commented 9 months ago

@HathewayWill. They both used GNU compilers that were installed via MacPorts.

HathewayWill commented 9 months ago

@jprestop

Might be the solution I'm using homebrew.

Can you ask them which gnu version of macports they are using

jprestop commented 9 months ago

I would think that Homebrew and MacPorts would be similar, but it could be an issue. One of the developers was using MacPorts 12.3.0 and also used the compile_MET_all.sh script successfully.

HathewayWill commented 9 months ago

@jprestop @georgemccabe

So for grins I tried to compile MET v11.0.0 using the same structure for installation as I did with V11.1.0

V11.0.0 didn't install so I am going to check my structure.

HathewayWill commented 9 months ago

@jprestop @georgemccabe

So for grins I tried to compile MET v11.0.0 using the same structure for installation as I did with V11.1.0

V11.0.0 didn't install so I am going to check my structure.

Got it to work on MacOS Sonoma but not 100% on MacOS Ventura. Ventura the MET Tests have errors but metplus still runs sucessfully

The fixes worked that you implemented it on Sonoma but I have attached the logs for Ventura. MET_logs.zip

Screenshot from 2024-01-28 05-15-49 Screenshot from 2024-01-28 05-16-20

HathewayWill commented 9 months ago

@jprestop @georgemccabe So for grins I tried to compile MET v11.0.0 using the same structure for installation as I did with V11.1.0 V11.0.0 didn't install so I am going to check my structure.

Got it to work on MacOS Sonoma but not 100% on MacOS Ventura. Ventura the MET Tests have errors but metplus still runs sucessfully

The fixes worked that you implemented it on Sonoma but I have attached the logs for Ventura. MET_logs.zip

And now Sonoma doesn't work. This is very confusing to me.

HathewayWill commented 9 months ago

@jprestop @georgemccabe @JohnHalleyGotway

With you're permission i'm going to close this issue and open two different ones for the different mac operating systems. I think there is two different issues going on for each OS and I want to keep them seperate.

I will reference this issue though in the new ones if that is okay with you?

jprestop commented 9 months ago

Hi @HathewayWill.

This situation is certainly very strange, particularly considering our developers have has successful compilations on various Mac OSs. This could be something in your environment, although it's not clear yet.

Even though MET's configure ran successfully for you, I do see the following error:

ld: Undefined symbols:
  _H5Pset_all_coll_metadata_ops, referenced from:
      _main in ccnwjJ9B.o
collect2: error: ld returned 1 exit status
configure:18015: $? = 1

The other developers did not receive that error. I have their config.log files and would like to step through to see the differences, but I haven't have had a chance to look into the above error or to compare the log files yet.

HathewayWill commented 9 months ago

@jprestop

I will retest and see what I can find and attach log files here.