FluidNumerics / FEOTS

BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Error message with RegionalExtraction #7

Closed JiaxuZ closed 5 years ago

JiaxuZ commented 6 years ago

Below is the error message when running RegionalExtraction

[jiaxu@cn230 operator_LW27pt_stencil]$ ./RegionalExtraction 
 S/R Load_POP_Mesh : Reading in the grid information from /usr/projects/cesm/FastSolver/feots/database/POP_0.3_Operators_5DayAvg_LW27pt/POP_03deg_mesh.nc
 S/R Load_POP_Mesh : Grid Dimensions (nX,nY,nZ) : (        1200 ,         800 ,         100 )
 S/R ConstructWetPointMap :
 Found     46647406 degrees of freedom from     96000000 mesh points.
 S/R : Build_Stencil : Constructing Lax-Wendroff27 stencil with LateralPlusCorners flavor.
  Finding cells in Region 
   Found      7711213 in region
   Region crosses prime-meridian
  Finding boundary cells
   Found       32998  boundary cells for Mask ID           1 .
   Found        3662  prescribed cells for Mask ID           1 .
   Found       32998  boundary cells for Mask ID           2 .
   Found       24210  prescribed cells for Mask ID           2 .
   Found       32998  boundary cells for Mask ID           3 .
   Found        5126  prescribed cells for Mask ID           3 .
 S/R WriteNetCDF_POP_Mesh : Writing the grid information to regional_mesh.nc
                            Defining dimensions of the mesh
                            Defining mesh variables
                            Defining units.
                            Writing variables to file.
                             Done!

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x147A9B52D407
#1  0x147A9B52DA1E
#2  0x147A9AA262EF
#3  0x147A9AA7126E
#4  0x40DBF8 in __pop_regional_class_MOD_trash_pop_regional
#5  0x417B27 in MAIN__ at RegionalExtraction.f90:?
Segmentation fault (core dumped)

This was run on the cn230 node, with a 503GB memory. The runtime.params looks like:

&POPMeshOptions
MeshType    = 'PeriodicTripole',
StencilType = 'LaxWendroff27',
Regional    = .TRUE.,
maskfile    = 'Atlantic_mask_update.nc',
/
&TracerModelOptions
/
&OperatorOptions
/
&FileOptions
extractRegionalOperators  = .FALSE.,
meshfile                  ='/usr/projects/cesm/FastSolver/feots/database/POP_0.3_Operators_5DayAvg_LW27pt/POP_03deg_mesh.nc',
graphfile                 ='/usr/projects/cesm/FastSolver/feots/database/POP_0.3_Operators_5DayAvg_LW27pt/pop_03_periodic-tripole_laxwendroff_27point',
operatorBaseName          = 'pop_03_periodic-tripole',
feotsOperatorDirectory    ='/usr/projects/cesm/FastSolver/feots/database/POP_0.3_Operators_5DayAvg_LW27pt/Global/',
IRFListFile               = 'IRFList_5dayAvg.txt',
IRFStart                  = 1,
nIRFFiles                 = 8,
/
&JFNKOptions
/
fluidnumerics-joe commented 6 years ago

@JiaxuZ I've pushed up two commits. Once you update, give the RegionalExtraction a try again. The error occurred at the end of the run ( in the Trash routine ) suggesting that I was trying to deallocate something that had not been allocated during the run... The most recent patch should fix this.

JiaxuZ commented 6 years ago

@schoonovernumerics It seems the new code does not fix the problem. Same error message. When making RegionalExtraction, I did get a warning /usr/bin/ld: warning: libnetcdf.so.7, needed by /home/jiaxu/Software/netcdf/lib/libnetcdff.so, may conflict with libnetcdf.so.11 Could this have anything to do with the error?

fluidnumerics-joe commented 6 years ago

That warning is specific to the way NetCDF was installed on the system you are running on - it should not be an issue though. Did the RegionalExtraction dump all of the correct files though ? Since the Trash routine is called at the end, all of the actual work should have been done. I’ll still dig a bit further here.

fluidnumerics-joe commented 6 years ago

In the your example directory, set DEBUG to yes in the appropriate FEOTS_Settings file and do a make clean and make RegionalExtraction. Re-run the RegionalExtraction executable and post the error message here... I really need to get on a system with enough memory..

JiaxuZ commented 6 years ago

Yes, RegionalExtraction generate mappings.regional and regional_mesh.nc. Both of them seem good. I turned on DEBUG and got the following message during building:

[jiaxu@darwin-fe1 operator_LW27pt_stencil]$ make RegionalExtraction 
make --directory=/home/jiaxu/FEOTS/build/ RegionalExtraction
make[1]: Entering directory `/home/jiaxu/FEOTS/build'
gfortran -cpp -O0 -g -ffree-line-length-none -Wall -fcheck=all -ffpe-trap=invalid -fbacktrace -c /home/jiaxu/FEOTS/src/common/ModelPrecision.f90 -o ModelPrecision.o
gfortran -cpp -O0 -g -ffree-line-length-none -Wall -fcheck=all -ffpe-trap=invalid -fbacktrace -c /home/jiaxu/FEOTS/src/common/ConstantsDictionary.f90 -o ConstantsDictionary.o
gfortran -cpp -O0 -g -ffree-line-length-none -Wall -fcheck=all -ffpe-trap=invalid -fbacktrace -c /home/jiaxu/FEOTS/src/common/CommonRoutines.f90 -L/home/jiaxu/Software/netcdf/lib -lnetcdff -lnetcdf -I/home/jiaxu/Software/netcdf/include -o CommonRoutines.o
gfortran -cpp -O0 -g -ffree-line-length-none -Wall -fcheck=all -ffpe-trap=invalid -fbacktrace -c /home/jiaxu/FEOTS/src/common/BinaryIO.f90 -o BinaryIO.o
/home/jiaxu/FEOTS/src/common/BinaryIO.f90:227.18:

     fileLength = SIZEOF( var )
                  1
Warning: Possible change of value in conversion from INTEGER(8) to INTEGER(4) at (1)
gfortran -cpp -O0 -g -ffree-line-length-none -Wall -fcheck=all -ffpe-trap=invalid -fbacktrace -c /home/jiaxu/FEOTS/src/matrices/CRSMatrix_Class.f90 -o CRSMatrix_Class.o
/home/jiaxu/FEOTS/src/matrices/CRSMatrix_Class.f90:544.22:

   INTEGER :: row, iEl
                      1
Warning: Unused variable 'iel' declared at (1)
/home/jiaxu/FEOTS/src/matrices/CRSMatrix_Class.f90:544.17:

   INTEGER :: row, iEl
                 1
Warning: Unused variable 'row' declared at (1)
/home/jiaxu/FEOTS/src/matrices/CRSMatrix_Class.f90:329.33:

   INTEGER :: i, j, iEl, row, col, nInRow, e1, e2, jlocal
                                 1
Warning: Unused variable 'col' declared at (1)
/home/jiaxu/FEOTS/src/matrices/CRSMatrix_Class.f90:329.28:

   INTEGER :: i, j, iEl, row, col, nInRow, e1, e2, jlocal
                            1
Warning: Unused variable 'row' declared at (1)
gfortran -cpp -O0 -g -ffree-line-length-none -Wall -fcheck=all -ffpe-trap=invalid -fbacktrace -c /home/jiaxu/FEOTS/src/POP/POP_Mesh_Class.f90 -L/home/jiaxu/Software/netcdf/lib -lnetcdff -lnetcdf -I/home/jiaxu/Software/netcdf/include -o POP_Mesh_Class.o
gfortran -cpp -O0 -g -ffree-line-length-none -Wall -fcheck=all -ffpe-trap=invalid -fbacktrace -c /home/jiaxu/FEOTS/src/POP/POP_GridTypeMappings.f90 -o POP_GridTypeMappings.o
gfortran -cpp -O0 -g -ffree-line-length-none -Wall -fcheck=all -ffpe-trap=invalid -fbacktrace -c /home/jiaxu/FEOTS/src/POP/POP_Stencil_Class.f90 -o POP_Stencil_Class.o
gfortran -cpp -O0 -g -ffree-line-length-none -Wall -fcheck=all -ffpe-trap=invalid -fbacktrace -c /home/jiaxu/FEOTS/src/POP/POP_AdjacencyGraph_Class.f90 -o POP_AdjacencyGraph_Class.o
/home/jiaxu/FEOTS/src/POP/POP_AdjacencyGraph_Class.f90:416.32:

         myGraph % valence(i) = localIOArray(k)
                                1
Warning: Possible change of value in conversion from INTEGER(8) to INTEGER(4) at (1)
/home/jiaxu/FEOTS/src/POP/POP_AdjacencyGraph_Class.f90:418.32:

         myGraph % color(i)   = localIOArray(k)
                                1
Warning: Possible change of value in conversion from INTEGER(8) to INTEGER(4) at (1)
gfortran -cpp -O0 -g -ffree-line-length-none -Wall -fcheck=all -ffpe-trap=invalid -fbacktrace -c /home/jiaxu/FEOTS/src/POP/POP_Regional_Class.f90 -L/home/jiaxu/Software/netcdf/lib -lnetcdff -lnetcdf -I/home/jiaxu/Software/netcdf/include -o POP_Regional_Class.o
gfortran -cpp -O0 -g -ffree-line-length-none -Wall -fcheck=all -ffpe-trap=invalid -fbacktrace -c /home/jiaxu/FEOTS/src/POP/POP_Params_Class.f90 -o POP_Params_Class.o
gfortran -cpp -O0 -g -ffree-line-length-none -Wall -fcheck=all -ffpe-trap=invalid -fbacktrace -c /home/jiaxu/FEOTS/src/POP/programs/RegionalExtraction.f90 -o RegionalExtraction.o
gfortran -cpp -O0 -g -ffree-line-length-none -Wall -fcheck=all -ffpe-trap=invalid -fbacktrace ModelPrecision.o ConstantsDictionary.o CommonRoutines.o BinaryIO.o CRSMatrix_Class.o POP_Mesh_Class.o POP_GridTypeMappings.o POP_Stencil_Class.o POP_Regional_Class.o POP_Params_Class.o RegionalExtraction.o -L/home/jiaxu/Software/netcdf/lib -lnetcdff -lnetcdf -I/home/jiaxu/Software/netcdf/include -o RegionalExtraction
/usr/bin/ld: warning: libnetcdf.so.7, needed by /home/jiaxu/Software/netcdf/lib/libnetcdff.so, may conflict with libnetcdf.so.11
make[1]: Leaving directory `/home/jiaxu/FEOTS/build'
mv /home/jiaxu/FEOTS/build/RegionalExtraction ./

Message during running:

[jiaxu@darwin-fe1 operator_LW27pt_stencil]$ ./RegionalExtraction 
 S/R Load_POP_Mesh : Reading in the grid information from /usr/projects/cesm/FastSolver/feots/database/POP_0.3_Operators_5DayAvg_LW27pt/POP_03deg_mesh.nc
 S/R Load_POP_Mesh : Grid Dimensions (nX,nY,nZ) : (        1200 ,         800 ,         100 )
 S/R ConstructWetPointMap :
 Found     46647406 degrees of freedom from     96000000 mesh points.
 S/R : Build_Stencil : Constructing Lax-Wendroff stencil with LateralPlusCorners flavor.
  Finding cells in Region 
   Found      7711213 in region
   Region crosses prime-meridian
  Finding boundary cells
   Found       32998  boundary cells for Mask ID           1 .
   Found        3662  prescribed cells for Mask ID           1 .
   Found       32998  boundary cells for Mask ID           2 .
   Found       24210  prescribed cells for Mask ID           2 .
   Found       32998  boundary cells for Mask ID           3 .
   Found        5126  prescribed cells for Mask ID           3 .
At line 816 of file /home/jiaxu/FEOTS/src/POP/POP_Regional_Class.f90
Fortran runtime error: Index '7711214' of dimension 2 of array 'myregion' outside of expected range (1:7711213)
fluidnumerics-joe commented 6 years ago

I’ll set up a system to work on tonight to recreate this problem.

JiaxuZ commented 6 years ago

Cool. Let me know what files you need and I'll port them to Turquoise.

fluidnumerics-joe commented 6 years ago

Just your mask file would be good. Send the path via e-mail (don't post on this public issue tracker)

JiaxuZ commented 6 years ago

A quick update: since the regional mesh file is generated, I turned on extractRegionalOperators = .TRUE., and try to run RegionalExtraction. The same error message comes out.

fluidnumerics-joe commented 5 years ago

@JiaxuZ This issue is now resolved with the latest commit.

git pull origin master

Then recompile and re run. Let me know if other issues pop up

JiaxuZ commented 5 years ago

Thank you, @schoonovernumerics for the updates! The new code does go beyond the point where it stopped, and starts to read the advect and vdiffu files. But there is a new issue popping up, which has something to do with CRSMatrix_Class.f90. I hope it will be an easy fix.

Below is the run message, with DEBUG turned on:

[jiaxu@cn230 operator_LW27pt_stencil]$ ./RegionalExtraction 
 S/R Load_POP_Mesh : Reading in the grid information from /usr/projects/cesm/FastSolver/feots/database/POP_0.3_Operators_5DayAvg_LW27pt/POP_03deg_mesh.nc
 S/R Load_POP_Mesh : Grid Dimensions (nX,nY,nZ) : (        1200 ,         800 ,         100 )
 S/R ConstructWetPointMap :
 Found     46647406 degrees of freedom from     96000000 mesh points.
 S/R : Build_Stencil : Constructing Lax-Wendroff 27-point stencil with LateralPlusCorners flavor.
  Finding cells in Region 
   Found      7711213 in region
   Region crosses prime-meridian
  Finding boundary cells
   Found       32998  boundary cells for Mask ID           1 .
   Found        3662  prescribed cells for Mask ID           1 .
   Found       32998  boundary cells for Mask ID           2 .
   Found       24210  prescribed cells for Mask ID           2 .
   Found       32998  boundary cells for Mask ID           3 .
   Found        5126  prescribed cells for Mask ID           3 .
 S/R WriteNetCDF_POP_Mesh : Writing the grid information to regional_mesh.nc
                            Defining dimensions of the mesh
                            Defining mesh variables
                            Defining units.
                            Writing variables to file.
                             Done!
 S/R : Build_Stencil : Constructing Lax-Wendroff 27-point stencil with Normal flavor.
  Extracting regional operators.
 Reading CRS Matrix files : /usr/projects/cesm/FastSolver/feots/database/POP_0.3_Operators_5DayAvg_LW27pt/Global/pop_03_periodic-tripole_advect.00001
 Reading CRS Matrix files : /usr/projects/cesm/FastSolver/feots/database/POP_0.3_Operators_5DayAvg_LW27pt/Global/pop_03_periodic-tripole_vdiffu.00001
At line 364 of file /home/jiaxu/FEOTS/src/matrices/CRSMatrix_Class.f90
Fortran runtime error: Index '23' of dimension 1 of array 'rowdata' outside of expected range (20:1)
fluidnumerics-joe commented 5 years ago

Copy that. I'm looking into it now.

fluidnumerics-joe commented 5 years ago

@JiaxuZ , is the lateral diffusion operator a 3x3 stencil ?

fluidnumerics-joe commented 5 years ago

Nevermind.. I found some documentation on this

fluidnumerics-joe commented 5 years ago

@JiaxuZ , I've pushed up some changes that should resolve this issue. Let me know if you run into any more trouble. Once you confirm that you can run forward integration, I'll close this issue

JiaxuZ commented 5 years ago

Yes, changing the limit from 20 to 40 works well. Thanks @schoonovernumerics ! I'm going to close this issue.

fluidnumerics-joe commented 5 years ago

Excellent!