geoschem / integrated_methane_inversion

Integrated Methane Inversion workflow repository.
https://imi.readthedocs.org
MIT License
25 stars 19 forks source link

jacobian simulation: 0109 exited with error code: 139 #174

Closed alvv1986 closed 8 months ago

alvv1986 commented 9 months ago

Hi!. This is Angel from Aarhus University in Denmark

I am getting an issue when running the IMI jacobian part for a domain covering Denmark. 670 jacobian simulations were set up, and they ran fine until simulation number 107. From simulation 108 onwards, it appears a HDF error, as can be seen in the file TestDKMay2018_0108.log (/home/ubuntu/imi_output_dir/TestDKMay2018/jacobian_runs/TestDKMay2018_0108).

I am running the IMI version 1.2.1 (Jul 19, 2023). Please find attached a zip file containing the IMI configuration file and also two log files that can help: imi_output.log, config.yml, TestDKMay2018_0108.log (jacobian log) files.zip

Instance features Virtual server type (instance type): c5.9xlarge Firewall (security group): New security group Storage (volumes): 1 volume(s) - 100 GiB

Thank you

laestrada commented 8 months ago

Hi @alvv1986,

My suspicion is that your volume is out of storage space. See here for HDF errors in geoschem. Try checking your storage space and, if full, extend your storage volume.

The other possibility is corrupted input files, but given that you had many successful jacobian runs, this seems unlikely.

Let us know if this resolves your issue.

-- Lucas

alvv1986 commented 8 months ago

Hi Lucas,

Thank you for the feedback!.

I am still grappling with that issue. Now, I am trying to run the model for the same period (May 2018), but for a slightly smaller area. This time, there were 212 Jacobian simulations set up, but as you can see below, it crashes at simulation 85.

... finished jacobian simulation: 0069 finished jacobian simulation: 0071 jacobian simulation: 0085 exited with error code: 139 Check the log file in the /home/ubuntu/imi_output_dir/TestDkMay2018/jacobian_runs/TestDkMay2018_0085 directory for more details. ...

You mentioned that the problem could be related to storage space; however, the imi_output_dir size was only 37 GiB when the model crashed. I have set my instance with a volume of 100 GiB, and my instance type is c5.9xlarge, as suggested in the Quick start guide.

Any suggestions?

Angel

El mar, 7 nov 2023 a las 15:49, Lucas A Estrada @.***>) escribió:

Hi @alvv1986 https://github.com/alvv1986,

My suspicion is that your volume is out of storage space. See here for HDF errors in geoschem https://github.com/geoschem/geos-chem/issues/1969. Try checking your storage space and, if full, extend your storage volume.

The other possibility is corrupted input files, but given that you had many successful jacobian runs, this seems unlikely.

Let us know if this resolves your issue.

-- Lucas

— Reply to this email directly, view it on GitHub https://github.com/geoschem/integrated_methane_inversion/issues/174#issuecomment-1798734461, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHF3C3TSQPMAOGE4MRGMW4LYDJDBLAVCNFSM6AAAAAA647XDKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJYG4ZTINBWGE . You are receiving this because you were mentioned.Message ID: @.***>

-- Tesis...Tesis...Tesis Tesis...Tesis...Tesis Tesis...Tesis...Tesis

laestrada commented 8 months ago

Hi @alvv1986,

The output directory may only be 37G, but there is also additional meteorological fields and input data that gets downloaded to the ExtData directory.

Try running df -h . from the /home/ubuntu directory to view the available capacity on your volume.

-- Lucas

alvv1986 commented 8 months ago

Hi Lucas,

Thank you again. I ran the model once more and could verify that it crashes due to a lack of storage space.

I would like to run the model using more cores and memory. Do you know what type of instance immediately follows the c5.9xlarge in terms of these features? What type of instance would you recommend for running the model a bit faster?

Angel

El jue, 9 nov 2023 a las 23:01, Lucas A Estrada @.***>) escribió:

Hi @alvv1986 https://github.com/alvv1986,

The output directory may only be 37G, but there is also additional meteorological fields and input data that gets downloaded to the ExtData directory.

Try running df -h . from the /home/ubuntu directory to view the space available.

-- Lucas

— Reply to this email directly, view it on GitHub https://github.com/geoschem/integrated_methane_inversion/issues/174#issuecomment-1804750868, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHF3C3SSPY7AHLCO5SNPGXLYDVHDRAVCNFSM6AAAAAA647XDKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBUG42TAOBWHA . You are receiving this because you were mentioned.Message ID: @.***>

-- Tesis...Tesis...Tesis Tesis...Tesis...Tesis Tesis...Tesis...Tesis

laestrada commented 8 months ago

Hi Angel,

Here is a link to the relevant instances. We recommend C5-type instances. Anything with more vCPUs should improve the runtime as long as you update the resource settings in the config file based on the number of vcpus you have:

SimulationCPUs: 32
SimulationMemory: 32000
JacobianCPUs: 1
JacobianMemory: 2000

Is assuming the c5.9xlarge instance, if you use a larger instance, you can increase SimulationCPUs and SimulationMemory. The Jacobian simulation settings you will need to play with to see what works.

-- Lucas

alvv1986 commented 8 months ago

Hi Lucas,

Thank you so much for your help!

I have just one more question. It will be the last one, I promise.

During the simulation, I found two fatal errors as listed below. I would like to know if these errors have a serious implication for the results. Could you please provide some insight on this?

Attached is the entire log file if you want to go through it.

Executing dry-run for posterior run... fatal error: An error occurred (404) when calling the HeadObject operation: Key "GEOSCHEM_RESTARTS/v2020-02/initial_GEOSChem_rst.2x25_CH4.nc" does not exist Log with unique file paths written to: log.dryrun.unique Downloading data from amazon

=== DONE CREATING POSTERIOR RUN DIRECTORY ===

=== CREATING JACOBIAN RUN DIRECTORIES === mkdir: created directory 'jacobian_runs' mkdir: created directory './jacobian_runs/DkMay2018_0000'

Executing dry-run for production runs... fatal error: An error occurred (404) when calling the HeadObject operation: Key "GEOSCHEM_RESTARTS/v2020-02/initial_GEOSChem_rst.2x25_CH4.nc" does not exist Log with unique file paths written to: log.dryrun.unique Downloading data from amazon mkdir: created directory './jacobian_runs/DkMay2018_0001'

Thanks again!

Angel

El vie, 10 nov 2023 a las 14:16, Lucas A Estrada @.***>) escribió:

Hi Angel,

Here is a link to the relevant instances https://aws.amazon.com/ec2/instance-types/c5/. We recommend C5-type instances. Anything with more vCPUs should improve the runtime as long as you update the resource settings in the config file based on the number of vcpus you have:

SimulationCPUs: 32 SimulationMemory: 32000 JacobianCPUs: 1 JacobianMemory: 2000

is assuming the c5.9xlarge instance, if you use a larger instance, you can increase SimulationCPUs and SimulationMemory. The Jacobian simulation settings you will need to play with to see what works.

Lucas

— Reply to this email directly, view it on GitHub https://github.com/geoschem/integrated_methane_inversion/issues/174#issuecomment-1805709860, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHF3C3TCLEWGBGOLCBVIVP3YDYSLRAVCNFSM6AAAAAA647XDKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBVG4YDSOBWGA . You are receiving this because you were mentioned.Message ID: @.***>

-- Tesis...Tesis...Tesis Tesis...Tesis...Tesis Tesis...Tesis...Tesis

laestrada commented 8 months ago

Hi Angel,

That error is nothing to worry about. It is related to downloading the input data for running the forward model, but the restart file just gets replaced with the boundary conditions anyways.

-- Lucas

alvv1986 commented 1 day ago

Hi Lucas,

Could you please explain how to calculate the total posterior uncertainty from the model?. Is there a model parameter to get this?

Thank you

Angel

laestrada commented 1 day ago

Hi Angel,

You can look at the inversion_result.nc file in the inversion directory. This contains the posterior error covariance matrix (S_post data variable). However, it is recommended also to do sensitivity inversions to better attribute uncertainties.

alvv1986 commented 20 hours ago

Hi Lucas,

I just looked at that file, and S_post only has data on the first state vector element. However, when I check the gridded_posterior.nc file, both S_post and A seem to have more realistic values. Why is there this difference? Can I use both S_post and A from the latter file?

Thank you

Angel

El jue, 1 ago 2024 a las 20:41, Lucas A Estrada @.***>) escribió:

Hi Angel,

You can look at the inversion_result.nc file in the inversion directory. This contains the posterior error covariance matrix (S_post data variable). However, it is recommended also to do sensitivity inversions to better attribute uncertainties.

— Reply to this email directly, view it on GitHub https://github.com/geoschem/integrated_methane_inversion/issues/174#issuecomment-2263726487, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHF3C3UF5HRO5BQI2ZJJAGLZPJ6L3AVCNFSM6AAAAABL25BNU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRTG4ZDMNBYG4 . You are receiving this because you were mentioned.Message ID: @.***>

-- Tesis...Tesis...Tesis Tesis...Tesis...Tesis Tesis...Tesis...Tesis

laestrada commented 19 hours ago

Hi Angel,

Yes, the information in those two files is equivalent. The inversion_result is the flattened array with each element corresponding to a state vector element -- element 1 is the error on state vector 1.

gridded_posterior is the same information from inversion_result mapped according to the lat/lon coordinates corresponding to each state vector element.

-- Lucas