geoschem / HEMCO

The Harmonized Emissions Component (HEMCO), developed by the GEOS-Chem Support Team.
https://hemco.readthedocs.io
Other
16 stars 32 forks source link

[BUG/ISSUE] Segmentation fault HEMCO Standalone on AWS #156

Closed arianatribby closed 1 year ago

arianatribby commented 2 years ago

Describe the bug:

I am running HEMCO SA on AWS but get a segmentation fault error during the run. Before I delve into debug mode, I just want to check that I am using the correct version of HEMCO SA/public AMI/instance size. I've run GEOS-Chem Classic on AWS without issues, so I am suspecting I might be something missing in the setup.

Expected behavior:

I followed the instructions on this guide to run HEMCO Standalone on AWS: https://hemco.readthedocs.io/en/latest/hco-sa-guide/run-standalone.html . I did not make any changes to the code.

Actual behavior:

Compilation was successful. During the run, there was an initial error that HEMCO_Config.rc.gmao_metfields did not exist. This makes me wonder whether I am running a version of HEMCO Standalone that is not supposed to be run on this particular AMI, or something else on the backend. There were no instructions regarding copying the HEMCO_Config.rc.gmao_metfields file. There were no errors during compilation or during the creation of the run directory. I copied this missing file from a GEOS-Chem Classic run and overcame this. Then, I get a segmentation fault error when HEMCO was reading the different emissions. Here is part of the terminal print:

EMCO (VOLCANO): Opening /home/ubuntu/ExtData/HEMCO/VOLCANO/v2021-09/2019/06/so2_volcanic_emissions_Carns.20190601
.rc                                                                                                               

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.                                   

Backtrace for this error:                                                                                         

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.                                   

Steps to reproduce:

Open instance on AWS. Chose r5.large (2vCPUs, 16GB memory, and increased storage to 400GB)

Using this guide: https://cloud-gc.readthedocs.io/en/latest/chapter02_beginner-tutorial/research-workflow.html

git clone geos chem 
create run direcory with geos chem with geos-fp met 
build the model
aws configure
source activate geo
edit input.geos to the same time frame you want hemco to run at (obviously at the same resolution)
dry run

I did the dry run to get all the required files for HEMCO run more easily.

Now, following this guide: https://hemco.readthedocs.io/en/latest/hco-sa-guide/run-standalone.html

git clone https://github.com/geoschem/hemco.git HEMCO
cd HEMCO/run
./createRunDir.sh
when it asks for hemco_config.rc, gave it /home/ubuntu/GC/rundirs/gc_4x5_geosfp_fullchem/HEMCO_Config.rc
When it asks for run directory path, gave it ~/HEMCO/rundirs 
cd rundirs/build 
cmake ../CodeDir -DRUNDIR=..
make -j 
make install

I edited the HEMCO_sa_Time.rc to 20190601-20190901. I didn't edit HEMCO_sa_Config.rc or the other config files.

cd ~/HEMCO/rundirs/hemco_4x5_geosfp
tmux
./hemco_standalone -c HEMCO_sa_Config.rc | tee run.log
Type `Ctrl + b`, and then type `d`
tail -f run.log  
Type `Ctrl + c` 

Required information:

Your HEMCO version and runtime environment:

yantosca commented 2 years ago

Hi @arianatribby, thanks for writing. You might try a c5.2xlarge (8 cores) or c5.4xlarge (16 cores), and see if you don't get the seg fault. My hunch is that you are probably running out of memory on the r5 instances.

Also note: We recently fixed a bug in the HEMCO standalone caused by recent edits to the run directory creation from GEOS-Chem 13.4.0. See the comments in PR #155. This fix is now in the dev branch of the HEMCO repository... and this will ship as HEMCO 3.5.0 alongside GEOS-Chem 14.0.0.

For now you can pull the code from the dev and try to build and run the standalone again. Furthermore, I just finished updating the HEMCO documentation at the new manual page https://hemco.readthedocs.io. If you take a look at the doc, please let me know if anything is incorrect and I can update it.

Also tagging @msulprizio, who made the fixes in PR #155.

arianatribby commented 2 years ago

Thanks @yantosca. I tried a c5.4xlarge with 400GB and the dev version of HEMCO (3.5.0). I still get the segmentation fault error in the same place. However, I am still using GEOS-Chem 13.4.1. Should I be using one of the dev versions of GEOS-Chem as well? Which one of the v14 should I use:

(tag: 14.0.0-rc.1, origin/release, origin/dev)
2022-06-03 12:44:14 -0400  (tag: 14.0.0-rc.0)
2022-06-01 21:50:49 -0400  (tag: 14.0.0-alpha.5)
2022-05-24 09:55:18 -0400  (tag: 14.0.0-alpha.4)
2022-05-19 22:03:39 -0400  (tag: 14.0.0-alpha.3)

Also, I noticed there are lots of AMI's I could use on AWS. I am using ami-0491da4eeba0fe986 with the following software:

Loading spack package gcc@10.2.0
Loading spack package flex%gcc@10.2.0
Loading spack package cmake%gcc@10.2.0
Loading spack package openmpi%gcc@10.2.0
Loading spack package netcdf-fortran%gcc@10.2.0
Loading spack package netcdf-c%gcc@10.2.0
Loading spack package nco%gcc@10.2.0
Loading spack package cdo%gcc@10.2.0
Loading spack package ncview%gcc@10.2.0
Loading spack package gdb%gcc@10.2.0
Loading spack package cgdb%gcc@10.2.0

Are they ok for the dev versions? Also, I want to note that even when using HEMCO 3.5.0, it is still missing the HEMCO_Config.rc.gmao_metfields, which I copied over from my geos run directory.

In any case, I recompiled in debug mode. Here is exactly what I did:

cd rundirs/build 
cmake ../CodeDir -DRUNDIR=.. -DCMAKE_BUILD_TYPE=Debug
make -j 
make install

cd ~/HEMCO/rundirs/hemco_4x5_geosfp
tmux
./hemco_standalone -c HEMCO_sa_Config.rc | tee run.log 2>&1

Here is the terminal print out near the error:

HEMCO: Opening /home/ubuntu/ExtData/HEMCO/OFFLINE_LIGHTNING/v2020-03/GEOSFP/2019
/FLASH_CTH_GEOSFP_0.25x0.3125_2019_06.nc4                                       
HEMCO (VOLCANO): Opening /home/ubuntu/ExtData/HEMCO/VOLCANO/v2021-09/2019/06/so2
_volcanic_emissions_Carns.20190601.rc
At line 735 of file /home/ubuntu/HEMCO/rundirs/hemco_4x5_geosfp/CodeDir/src/Exte
nsions/hcox_lightnox_mod.F90
Fortran runtime error: Index '-18583' of dimension 1 of array 'inst%profile' bel
ow lower bound of 1

I've attached files of potential interest. Thanks for your help.

HEMCO_sa_Config.rc_sa_072122.txt HEMCO_Diagn.rc_sa_072122.txt HEMCO_Config.rc.gmao_metfields_sa_072122.txt HEMCO.log_sa_072122.txt run.log_sa_072122.txt HEMCO_sa_Grid.4x5.rc_sa_072122.txt HEMCO_sa_Spec.rc_sa_072122.txt

yantosca commented 2 years ago

Thanks for the update @arianatribby. The HEMCO_Config.gmao_metfields is included into HEMCO_Config.rc so you do have to copy it over from the run directory. I will update the run directory creation script and the hemco.readthedocs.io to reflect this.

Am also digging into the out-of-bounds error. Wonder if it has to do with the met fields as the lightning NOx extension needs the vertical grid information.

Also you can try 14.0.0-rc.0 for the AMI. But you would want to update to the 14.0.0 release once it comes out. We are waiting for a couple benchmark simulations to finish and then that needs to go to the GCSC for approval. So we're hoping to have the version out by mid-August if possible.

arianatribby commented 2 years ago

Thanks @yantosca. Please let me know if you have any suggestions for me to try. I haven't modified the met fields file.

arianatribby commented 2 years ago

@yantosca I should have mentioned that I set up the 14.0.0-rc.0 version of geos chem to download files/used to generate the the config files, but I still get the same error in the same place.

yantosca commented 2 years ago

Hi again @arianatribby. Thanks for your patience.

@msulprizio recently pushed a fix (PR #157) that fixed a bug with the HEMCO standalone run directory creation that I inadvertently added to the code. This is in the HEMCO 3.5.0-rc.1 and GEOS-Chem 14.0.0-rc.1 codes.

I was just now able to successfully compile and run an out-of-the-box 1-month simulation with the HEMCO standalone on the same AMI that you used (ami-0491da4eeba0fe986) with these instructions.

Clone GCClassic and checkout a branch at release candidate 14.0.0-rc.1

git clone https://github.com/geoschem/GCClassic  
cd GCClassic
git checkout tags/14.0.0-rc.1
git branch 14.0.0.rc-1
git checkout 14.0.0-rc.1
git submodule update --init --recursive

Create a GCClassic run directory

cd run
./createRunDir.sh  and follow prompts
cd /path/to/GCClassic/rundir

Compile and build GEOS-Chem Classic

cd build
cmake ../CodeDir -DRUNDIR=..
make -j
make -j install
cd ..

Run a GEOS-Chem Classic dry-run simulation to download data

Check your settings in geoschem_config.yml, HEMCO_Config.rc, HISTORY.rc, etc, then

./gcclassic --dryrun | tee log.dryrun
conda activate geo
./download_data.py log.dryrun --amazon
conda deactivate

Create a HEMCO-standalone rundir

cd CodeDir/src/HEMCO/run
./createRunDir.sh and follow the prompts

NOTE: During the run directory creation it will ask you to specify a path to a HEMCO_Config.rc file that is already created in a GEOS-Chem run dir. so you would type /path/to/GCClassic/rundir/HEMCO_Config.rc

Compile and run the HEMCO standalone

cd /path/to/HEMCO/standalone/rundir    # the one you just created
cd build
cmake ../CodeDir -DRUNDIR=..
make -j
make install
cd ..

Check your settings in the HEMCO_sa_Config.rc, HEMCO_sa_Grid.rc, HEMCO_sa_Time.rc files, then,

./hemco_standalone -c HEMCO_sa_Config.rc 

You should get a HEMCO.log file similar to this: HEMCO.log

Let me know if you still have issues. I think the error that you had with the volcano emissions is because you may not have had the proper period on disk (which is why I did the GEOS-Chem dry-run first).

It may be that the instructions on the GEOS-Chem Cloud ReadTheDocs need updating. I'll take a look. It may be a while before I can fix the doc given other things going on here.

yantosca commented 2 years ago

Also note, the GEOS-Chem on cloud tutorial has moved to: https://geos-chem-cloud.readthedocs.io/en/latest/

yantosca commented 2 years ago

And we have recently updated hemco.readthedocs.io with more information about setting up a standalone run: https://hemco.readthedocs.io/en/latest/hco-sa-guide/intro.html

arianatribby commented 2 years ago

Thanks @yantosca!

I found the problem. I ran HEMCO successfully using Merra-2 meteorology. However, all of my previous attempts used GEOS-FP meteorology. There appears to be a bug when HEMCO reads the volcano emissions using GEOS-FP met. For now, I think I am going to switch to Merra, but I wanted to let you know. I hope my log files above can be helpful for the team if they will be looking into the bug further.

I also noticed that Jupyter is not installed with ami-0491da4eeba0fe986 that we used to run the model. Is there an AMI that you recommend for analysis? (I have opened several other AMI's for GEOS Chem and have found one that has Jupyter, but some of the plotting packages are much older. I don't want to affect other package dependencies in GCPy environment by downloading my own.)

Thanks again!

yantosca commented 2 years ago

@arianatribby: thanks for letting me know about the Jupyter not being installed. We'll try to fix this for the AMI that we generate for 14.0.0.

Could you list the details of the MERRA-2 volcano bug in a separate issue? That'll make it easier to keep things straight and to make the issue searchable.

arianatribby commented 2 years ago

@yantosca yes, will create a new issue. (Sorry, I was away.) Separately though, I am planning on simulating ethane and propane 2020-2022, but I've noticed there are emission files missing. The EPA16/NEI only go until 2020. Do you know when these emission files might be available? In the meantime, what do you recommend I do?

Additional emission files that are not available 2020-2022 are OFFLINE_SOILNOX, OFFLINE_BIOVOC, GFED4, OFFLINE_SEASALT, OFFLINE_LIGHTNING, and possibly more. For these, is the most efficient way to change the time variable within each nc file for simulating beyond 2020? (Since I am doing C2 and C3.)

yantosca commented 2 years ago

@yantosca yes, will create a new issue. (Sorry, I was away.) Separately though, I am planning on simulating ethane and propane 2020-2022, but I've noticed there are emission files missing. The EPA16/NEI only go until 2020. Do you know when these emission files might be available? In the meantime, what do you recommend I do?

I am not sure if the EPA/NEI16 data goes past 2020. We do not prepare those ourselves, we have someone in the GEOS-Chem community supply them to us.

Also tagging @msulprizio and @Jourdan-He

Additional emission files that are not available 2020-2022 are OFFLINE_SOILNOX, OFFLINE_BIOVOC, GFED4, OFFLINE_SEASALT, OFFLINE_LIGHTNING, and possibly more. For these, is the most efficient way to change the time variable within each nc file for simulating beyond 2020? (Since I am doing C2 and C3.)

I think some of those newer offline fields may still be in the process of being generated. @YanshunLi-washu might be able to update you on that.

You could use the online emissions for dates for which there are no offline emissions. Or you could create your own offline emissions by running the HEMCO standalone at 0.5 x 0.625 (MERRA-2) and then reading those into your GEOS-Chem simulatilon.

YanshunLi-washu commented 2 years ago

Hi @arianatribby . Thanks for letting us know. The offline soilnox, biovoc, seasalt emissions for 2021 and 2022 will be available in one month or so. The use of online emissions is recommended if your application is urgent. For the update of offline lightning, @ltmurray may know more.

arianatribby commented 2 years ago

Hi @YanshunLi-washu , I will now be evaluating the impact of 2020 NOx emissions on a full chemistry run. How will turning off the "offline" emissions affect my simulation? (I am confused how or when the offline emissions are used.)

Also, I notice that GFED4 emissions are not available for 2020 - do you know when they might be available or what I can do in the meantime? Thanks.

YanshunLi-washu commented 2 years ago

Hi @arianatribby I would recommend to run the highest spatial resolution of GEOS-Chem classic (0.25x0.3125) to minimize the impacts of turning off offline emissions. Hi Bob @yantosca I'm not sure who is maintaining the GFED4 emissions, could you kindly help with this?Thanks.

yantosca commented 2 years ago

Hi @arianatribby and @YanshunLi-washu. I think we have gotten the GFED4 biomass burning via @pkasibhatla. I'm also looping in the Emissions Working Group co-chairs: @jaegle @eamarais

arianatribby commented 2 years ago

Thanks @yantosca , do you think there is a workaround I can do for the GFED4 so that I can run 2020? Also, how would I generate offline emissions via the standalone if the base emissions for offline don't exist for 2020? Thanks!

arianatribby commented 2 years ago

Also @yantosca I have a small question. I generated total emissions for a certain species via the standalone, then I replaced all base emissions of that species with the standalone emissions using HEMCO_Config.rc. This was successful, and I noticed a difference in the species conc compared to default.

However, now I want to scale the total emissions of the species (while still using the emissions from the standalone). I did EmissScale_: val in the HEMCO_Config.rc. I haven't seen a difference though (the "scaled" species concentration is the same as the "unscaled"). When HEMCO calculates emissions for this species, does it scale but then read in the standalone emissions, thereby nulling this effort? I need it to get the emissions from the standalone, then scale! Thanks!

pkasibhatla commented 2 years ago

@arianatribby, @yantosca, GFED4.1s data is available at https://globalfiredata.org/pages/data/. I do not know what our current approach is for processing these files for input into GC.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days it will be closed. You can add the "never stale" tag to prevent the Stale bot from closing this issue.

stale[bot] commented 1 year ago

Closing due to inactivity

ltmurray commented 10 months ago

Reopening in case anyone gets here via Google like I did re: the question about EmissScale. The HEMCO documentation has a typo: "EmissScale" should be "EmisScale_" to match what is in the code. (Correctly given in the example, but not in the header or description).

yantosca commented 10 months ago

Thanks @ltmurray. This is now fixed in commit 43ec3788. For now it will show up in the "latest" documentation until the next HEMCO release, when it will be "stable",