beatrixparis / connectivity-modeling-system

The CMS is a multiscale stochastic Lagrangian framework developed by Paris' Lab at the Rosenstiel School of Marine, Atmospheric & Earth Science to study complex behaviors, giving probabilistic estimates of dispersion, connectivity, fate of pollutants, and other Lagrangian phenomena. This repository facilitates community contributions to CMS modules
https://beatrixparis.github.io/connectivity-modeling-system/
GNU General Public License v3.0
31 stars 25 forks source link

Error when running the example data: floating-point exception #40

Open livipeluso opened 3 years ago

livipeluso commented 3 years ago

Hello everyone,

I'm new at CMS and I hope someone could help me with an error I'm getting when I try to run cms for the example data.

I installed CMS following the tutorial and at first I got some problems with the archive netcdf.mod, but I managed to solve it and installation appears to be ok. When I run the test with the example data to verify it, getdata run without a problem and download the .nc files. However, when I try to run cms, this error appears:

$ ./cms example

File name : traj_file_1 Total number of release events : 3 (numbers 1 to 3) Total number of particles released : 30 Total number of time steps : 289 Total number of time steps in output file: 11

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:

0 0x7f7a52ca4d01 in ???

1 0x7f7a52ca3ed5 in ???

2 0x7f7a5290820f in ???

3 0x559feef5d5b4 in ???

4 0x559feef969f3 in ???

5 0x559feef96a4c in ???

6 0x7f7a528e90b2 in ???

7 0x559feef3464d in ???

8 0xffffffffffffffff in ???

Floating point exception (core dumped)

I searched for this error on google and the suggestions I found concern changing parameters of the model (reference values different from zero, boundary conditions or time step length) so that no divisions by 0 occur. However, I believe these do not apply to my case since I'm trying to run the example.. I found the CMS google group and some notes concerning installation in https://groups.google.com/g/connectivity-modeling-system-club/c/67kpxJTIHHo/m/6GruB75dAAAJ. I thought something might have gone wrong with the installation, so I tried to install everything again and this time I also installed some packages I didn't have that were on these notes (zlib and hdf5-tools). I run make again, getdata works, but when I run cms I get the same error.. I also tried to comment the mpi lines to see if the problem was associated with it, but I get the same error.

If anyone could help me with this, I would really really appreciate it! Thanks in advance!

Saludos

Lívia Peluso

Yuma248 commented 2 years ago

The same problem here, after installing and reinstalling the package I got the same error, I wonder if this problem is related to the warning below:

loop.f90:248:43:

243 | particle(r)%ndepth(n),startsec,particle(r)%diam(n), & | 2 ...... 248 | particle(r)%ndepth(n), startsec,-1, & | 1 Warning: Type mismatch between actual argument at (1) and actual argument at (2) (INTEGER(4)/REAL(4)).

Yuma248 commented 2 years ago

This worked for me

after running make with this error, I run

mpif90 -c loop.f90 -I/folder/with/include -w -fallow-argument-mismatch -O2 make

livipeluso commented 11 months ago

Hi Gamayo,

No, unfortunately I could not solve this problem.. :/ I eventually managed to run CMS in a friend lab's computer that already had it installed. Sorry for not being more helpful. If you find a way to do it, please let me know!

Em seg., 20 de nov. de 2023 04:43, Gamoyo @.***> escreveu:

Hello everyone,

I'm new at CMS and I hope someone could help me with an error I'm getting when I try to run cms for the example data.

I installed CMS following the tutorial and at first I got some problems with the archive netcdf.mod, but I managed to solve it and installation appears to be ok. When I run the test with the example data to verify it, getdata run without a problem and download the .nc files. However, when I try to run cms, this error appears: $ ./cms example File name : traj_file_1

Total number of release events : 3 (numbers 1 to 3) Total number of particles released : 30 Total number of time steps : 289 Total number of time steps in output file: 11 Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error: #0 0x7f7a52ca4d01 in ??? #1 https://github.com/beatrixparis/connectivity-modeling-system/issues/1 0x7f7a52ca3ed5 in ??? #2 https://github.com/beatrixparis/connectivity-modeling-system/issues/2 0x7f7a5290820f in ??? #3 https://github.com/beatrixparis/connectivity-modeling-system/issues/3 0x559feef5d5b4 in ??? #4 https://github.com/beatrixparis/connectivity-modeling-system/issues/4 0x559feef969f3 in ??? #5 https://github.com/beatrixparis/connectivity-modeling-system/issues/5 0x559feef96a4c in ??? #6 https://github.com/beatrixparis/connectivity-modeling-system/issues/6 0x7f7a528e90b2 in ??? #7 https://github.com/beatrixparis/connectivity-modeling-system/issues/7 0x559feef3464d in ??? #8 https://github.com/beatrixparis/connectivity-modeling-system/issues/8 0xffffffffffffffff in ??? Floating point exception (core dumped)

I searched for this error on google and the suggestions I found concern changing parameters of the model (reference values different from zero, boundary conditions or time step length) so that no divisions by 0 occur. However, I believe these do not apply to my case since I'm trying to run the example.. I found the CMS google group and some notes concerning installation in https://groups.google.com/g/connectivity-modeling-system-club/c/67kpxJTIHHo/m/6GruB75dAAAJ. I thought something might have gone wrong with the installation, so I tried to install everything again and this time I also installed some packages I didn't have that were on these notes (zlib and hdf5-tools). I run make again, getdata works, but when I run cms I get the same error.. I also tried to comment the mpi lines to see if the problem was associated with it, but I get the same error.

If anyone could help me with this, I would really really appreciate it! Thanks in advance!

Saludos

Lívia Peluso

Hi Livia,

I am getting the same error on multiple machines and I was wondering if you managed to get the model run!

Thanks

— Reply to this email directly, view it on GitHub https://github.com/beatrixparis/connectivity-modeling-system/issues/40#issuecomment-1818390101, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASZLHU6FHIFILFKFZSVBPBLYFMC3RAVCNFSM4XNC2UYKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBRHAZTSMBRGAYQ . You are receiving this because you authored the thread.Message ID: @.*** com>

silvaglx commented 8 months ago

Any updates on this? Also trying to run the example simulation and facing the same error.

Update 1: I believe this problem is being caused by conflicts with recent NetCDF libraries and cms core script, as discussed in this issue. Selecting the output as ASCII gives us the SIGFPE: Floating-point exception - erroneous arithmetic operation error, while selecting the output as NetCDF gives us the Not a valid data type or _FillValue mismatch error. This might explain why this problem started to appear only in recent years and we cannot find any information about it from previous users. It seems that there's something preventing cms script to edit and move output files in SCRATCH. I'm currently trying to run the program under Cygwin64, so installing older netcdf packages is a bit tricky. As soon as I manage to set up the new working environment I'll update here. In the meantime, it would be nice to hear from other users which NetCDF version they are currently using.

Update 2: Unfortunately, setting up a new working environment with older NetCDF and HDF versions did not solve the floating-point exception error. At least, it did managed to correct the Not a valid data type or _FillValue mismatch error when selecting NetCDF output, as confirmed in the mentioned issue. Now both ASCII and NetCDF output options gives the same floating error.

silvaglx commented 8 months ago

Solved!

I'm not sure what exactly was causing the problem, but after downgrading my NetCDF and HDF packages and performing test runs modifying the parameters at input_example/runconf.list the floating error magically disappeared. Firstly, I set the options for saving restart files and restarting to true and started receiving a new Fortran Error termination. Backtrace error instead of the previous one. Then, I set the Landmask Boundary Condition to false and the errors disappeared. However, the output result was still a bit weird since the timestep was not correctly being saved into the NetCDF. After activating again the Landmask Boundary Condition and Periodic Boundary Condition the results finally came out okay.

I'm sorry for not giving any concrete response on how to solve this issue, and I might encounter it again in the future as well. However, for any new user facing the same problems in this example simulation, I recommend you to first carefully edit the input_example/runconf.list and perform test runs to check if you can get rid of this error, especially focusing in the parameters I mentioned above. In case the error persists, I recommend you to downgrade your NetCDF and HDF versions to a setup similar (or below) as mine: NetCDF-C 4.6.1, NetCDF-Fortran 4.4.5 and HDF-1.10.5