gflow / GFlow

Software for modeling circuit theory-based connectivity
GNU General Public License v3.0
22 stars 5 forks source link

Use of `node_pairs` #21

Closed jmsigner closed 7 years ago

jmsigner commented 7 years ago

When using the argument node_pairs, recognizes the file, runs each pair, but throws an error when finishing up. The error I get is:

[0]PETSC ERROR: ------------------------------------------------------------------------ [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [0]PETSC ERROR: to get more information on the crash. [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Signal received [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.7.3, Jul, 24, 2016 [0]PETSC ERROR: ./gflow.x on a x86_64-linux-gnu-real named 863fad3bfadd by gflow Wed Aug 2 09:30:21 2017 [0]PETSC ERROR: Configure options --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --with-silent-rules=0 --libdir=${prefix}/lib/x86_64-linux-gnu --libexecdir=${prefix}/lib/x86_64-linux-gnu --with-maintainer-mode=0 --with-dependency-tracking=0 --with-debugging=0 --shared-library-extension=_real --with-hypre=1 --with-hypre-dir=/usr --with-clanguage=C++ --with-shared-libraries --with-pic=1 --useThreads=0 --with-fortran-interfaces=1 --with-mpi-dir=/usr/lib/openmpi --with-blas-lib=-lblas --with-lapack-lib=-llapack --with-blacs=1 --with-blacs-lib="-lblacsCinit-openmpi -lblacs-openmpi" --with-scalapack=1 --with-scalapack-lib=-lscalapack-openmpi --with-mumps=1 --with-mumps-include="[]" --with-mumps-lib="-ldmumps -lzmumps -lsmumps -lcmumps -lmumps_common -lpord" --with-suitesparse=1 --with-suitesparse-include=/usr/include/suitesparse --with-suitesparse-lib="-lumfpack -lamd -lcholmod -lklu" --with-spooles=1 --with-spooles-include=/usr/include/spooles --with-spooles-lib=-lspooles --with-ptscotch=1 --with-ptscotch-include=/usr/include/scotch --with-ptscotch-lib="-lptesmumps -lptscotch -lptscotcherr" --with-fftw=1 --with-fftw-include="[]" --with-fftw-lib="-lfftw3 -lfftw3_mpi" --with-superlu=0 --CXX_LINKER_FLAGS=-Wl,--no-as-needed --prefix=/usr/lib/petscdir/3.7.3/x86_64-linux-gnu-real PETSC_DIR=/build/petsc-fA70UI/petsc-3.7.3.dfsg1 --PETSC_ARCH=x86_64-linux-gnu-real CFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-fA70UI/petsc-3.7.3.dfsg1=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC" CXXFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-fA70UI/petsc-3.7.3.dfsg1=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC" FCFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-fA70UI/petsc-3.7.3.dfsg1=. -fstack-protector-strong -fPIC" FFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-fA70UI/petsc-3.7.3.dfsg1=. -fstack-protector-strong -fPIC" CPPFLAGS="-Wdate-time -D_FORTIFY_SOURCE=2" LDFLAGS="-Wl,-Bsymbolic-functions -Wl,-z,relro -fPIC" MAKEFLAGS=w [0]PETSC ERROR: #1 User provided function() line 0 in unknown file

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 59.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.

xgirouxb commented 7 years ago

I also experienced this error message, but it occurs before anything computes successfully. However, in my case it occurs even if without the of the "node_pairs" flag: /usr/bin/mpiexec Wed Aug 2 04:49:47 EDT 2017 Wed Aug 2 04:49:47 2017 >> Effective resistance will be written to ./Outputs/R_eff_tw_ocf_5N.csv. Wed Aug 2 04:49:47 2017 >> Simulation will converge at 0.99999 Wed Aug 2 04:49:47 2017 >> (rows,cols) = (3342,3623) Wed Aug 2 04:49:50 2017 >> Removed 0 islands (0 cells). Wed Aug 2 04:49:50 2017 >> (rows,cols) = (3342,3623) Wed Aug 2 04:49:52 2017 >> 16777 points in ./Inputs/tw_nodes_ocf_30m.asc Wed Aug 2 04:49:52 2017 >> Max distance: 1333333.33 pixels Wed Aug 2 04:49:57 2017 >> 140725476 pairs generated. 0 skipped. [31]PETSC ERROR: ------------------------------------------------------------------------ [31]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [31]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [31]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [31]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [31]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [31]PETSC ERROR: to get more information on the crash. [31]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [31]PETSC ERROR: Signal received [31]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [31]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [31]PETSC ERROR: ./gflow.x on a x86_64-linux-gnu-real named rangifer by rangifer Wed Aug 2 04:49:47 2017 [31]PETSC ERROR: Configure options --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --with-silent-rules=0 --libdir=${prefix}/lib/x86_64-linux-gnu --libexecdir=${prefix}/lib/x86_64-linux-gnu --with-maintainer-mode=0 --with-dependency-tracking=0 --with-debugging=0 --shared-library-extension=_real --with-clanguage=C++ --with-shared-libraries --with-pic=1 --useThreads=0 --with-fortran-interfaces=1 --with-mpi-dir=/usr/lib/x86_64-linux-gnu/openmpi --with-blas-lib=-lblas --with-lapack-lib=-llapack --with-blacs=1 --with-blacs-lib="-lblacsCinit-openmpi -lblacs-openmpi" --with-scalapack=1 --with-scalapack-lib=-lscalapack-openmpi --with-mumps=1 --with-mumps-include="[]" --with-mumps-lib="-ldmumps -lzmumps -lsmumps -lcmumps -lmumps_common -lpord" --with-suitesparse=1 --with-suitesparse-include=/usr/include/suitesparse --with-suitesparse-lib="-lumfpack -lamd -lcholmod -lklu" --with-spooles=1 --with-spooles-include=/usr/include/spooles --with-spooles-lib=-lspooles --with-ptscotch=1 --with-ptscotch-include=/usr/include/scotch --with-ptscotch-lib="-lptesmumps -lptscotch -lptscotcherr" --with-fftw=1 --with-fftw-include="[]" --with-fftw-lib="-lfftw3 -lfftw3_mpi" --with-superlu=1 --with-superlu-include=/usr/include/superlu --with-superlu-lib=-lsuperlu --with-hdf5=1 --with-hdf5-dir=/usr/lib/x86_64-linux-gnu/hdf5/openmpi --CXX_LINKER_FLAGS=-Wl,--no-as-needed --with-hypre=1 --with-hypre-include=/usr/include/hypre --with-hypre-lib="-lHYPRE_IJ_mv -lHYPRE_parcsr_ls -lHYPRE_sstruct_ls -lHYPRE_sstruct_mv -lHYPRE_struct_ls -lHYPRE_struct_mv -lHYPRE_utilities" --prefix=/usr/lib/petscdir/3.7.5/x86_64-linux-gnu-real PETSC_DIR=/build/petsc-fcyWHu/petsc-3.7.5+dfsg1 --PETSC_ARCH=x86_64-linux-gnu-real CFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-fcyWHu/petsc-3.7.5+dfsg1=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC" CXXFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-fcyWHu/petsc-3.7.5+dfsg1=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC" FCFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-fcyWHu/petsc-3.7.5+dfsg1=. -fstack-protector-strong -fPIC" FFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-fcyWHu/petsc-3.7.5+dfsg1=. -fstack-protector-strong -fPIC" CPPFLAGS="-Wdate-time -D_FORTIFY_SOURCE=2" LDFLAGS="-Wl,-Bsymbolic-functions -Wl,-z,relro -fPIC" MAKEFLAGS=w [31]PETSC ERROR: #1 User provided function() line 0 in unknown file

MPI_ABORT was invoked on rank 31 in communicator MPI_COMM_WORLD with errorcode 59.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.

[30]PETSC ERROR: ------------------------------------------------------------------------ [30]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [30]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [30]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [30]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [30]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [30]PETSC ERROR: to get more information on the crash. [30]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [30]PETSC ERROR: Signal received [30]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [30]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [30]PETSC ERROR: ./gflow.x on a x86_64-linux-gnu-real named rangifer by rangifer Wed Aug 2 04:49:47 2017 [30]PETSC ERROR: Configure options --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --with-silent-rules=0 --libdir=${prefix}/lib/x86_64-linux-gnu --libexecdir=${prefix}/lib/x86_64-linux-gnu --with-maintainer-mode=0 --with-dependency-tracking=0 --with-debugging=0 --shared-library-extension=_real --with-clanguage=C++ --with-shared-libraries --with-pic=1 --useThreads=0 --with-fortran-interfaces=1 --with-mpi-dir=/usr/lib/x86_64-linux-gnu/openmpi --with-blas-lib=-lblas --with-lapack-lib=-llapack --with-blacs=1 --with-blacs-lib="-lblacsCinit-openmpi -lblacs-openmpi" --with-scalapack=1 --with-scalapack-lib=-lscalapack-openmpi --with-mumps=1 --with-mumps-include="[]" --with-mumps-lib="-ldmumps -lzmumps -lsmumps -lcmumps -lmumps_common -lpord" --with-suitesparse=1 --with-suitesparse-include=/usr/include/suitesparse --with-suitesparse-lib="-lumfpack -lamd -lcholmod -lklu" --with-spooles=1 --with-spooles-include=/usr/include/spooles --with-spooles-lib=-lspooles --with-ptscotch=1 --with-ptscotch-include=/usr/include/scotch --with-ptscotch-lib="-lptesmumps -lptscotch -lptscotcherr" --with-fftw=1 --with-fftw-include="[]" --with-fftw-lib="-lfftw3 -lfftw3_mpi" --with-superlu=1 --with-superlu-include=/usr/include/superlu --with-superlu-lib=-lsuperlu --with-hdf5=1 --with-hdf5-dir=/usr/lib/x86_64-linux-gnu/hdf5/openmpi --CXX_LINKER_FLAGS=-Wl,--no-as-needed --with-hypre=1 --with-hypre-include=/usr/include/hypre --with-hypre-lib="-lHYPRE_IJ_mv -lHYPRE_parcsr_ls -lHYPRE_sstruct_ls -lHYPRE_sstruct_mv -lHYPRE_struct_ls -lHYPRE_struct_mv -lHYPRE_utilities" --prefix=/usr/lib/petscdir/3.7.5/x86_64-linux-gnu-real PETSC_DIR=/build/petsc-fcyWHu/petsc-3.7.5+dfsg1 --PETSC_ARCH=x86_64-linux-gnu-real CFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-fcyWHu/petsc-3.7.5+dfsg1=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC" CXXFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-fcyWHu/petsc-3.7.5+dfsg1=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC" FCFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-fcyWHu/petsc-3.7.5+dfsg1=. -fstack-protector-strong -fPIC" FFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-fcyWHu/petsc-3.7.5+dfsg1=. -fstack-protector-strong -fPIC" CPPFLAGS="-Wdate-time -D_FORTIFY_SOURCE=2" LDFLAGS="-Wl,-Bsymbolic-functions -Wl,-z,relro -fPIC" MAKEFLAGS=w [30]PETSC ERROR: #1 User provided function() line 0 in unknown file Wed Aug 2 04:50:24 2017 >> Signal 15 recieved. [rangifer:23732] 1 more process has sent help message help-mpi-api.txt / mpi-abort [rangifer:23732] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messag

pbleonard commented 7 years ago

@jmsigner @xgirouxb can you please send example node input file with submit script so I can have a look?

pbleonard commented 7 years ago

@jmsigner I was able to successfully reproduce your error with the node_pairs flag. As of right now this flag should not be used. However, you should still be able to calculate your problem -- So simply place your list of nodes to calculate behind the nodes flag instead. You can still turn on/off the shuffle_node_pairs flag if you wish. So based on the examples you sent just try: -nodes node_pair_file. Let me know if you have additional questions.

jmsigner commented 7 years ago

@pbleonard thanks for your quick reply. When I use -nodes node_pair_file, where would I provide the actual coordinates of the nodes? When I just use the above, the node pairs are taken as coordinates. E.g., from the output Thu Aug 3 18:21:45 2017 >> Solving pair 0 (1 of 1): 1[0,1] to 2[0,2]. 0.10 Km apart, the pairs are interpreted as coordinates, right?

pbleonard commented 7 years ago

@jmsigner All nodes should be input into gflow as coordinates relative to the resistance grid. If you're having problems with inputs you can also simply input an .asc grid of nodes and it will translate to relative coordinates for you.

xgirouxb commented 7 years ago

Here is a link to the inputs and script that cause the error: https://my.pcloud.com/publink/show?code=XZMqDcZid5b0btkeNJpkgITRLSMMByvJpdk

pbleonard commented 7 years ago

@eduffy Should the flag node_pairs be deprecated? or is there a bug? It doesn't appear to work in conjunction with or without nodes flag

eduffy commented 7 years ago

@jmsigner - What other flags did you pass to gflow.x? Were you writing the current denisty map at every iteration, or saving the effective resistance? You said the program crashed at the end, is that the first time any output (other than the terminal) was written?

eduffy commented 7 years ago

@pbleonard - I don't think it should. nodes is required; it defines the focal points. After that, you may provide an explicit list of node pairs to solve via the node_pairs flag. It's not an ideal scenario, but this would be useful for using gflow to solve every pair in a high-throughput environment and you want to make sure there's no duplication of effort / wasted computing time.

Can you send me jmsigner's scripts?

jmsigner commented 7 years ago

@eduffy here are my input files: https://dl.dropboxusercontent.com/u/5554895/gflow.zip

eduffy commented 7 years ago

@jmsigner Can you send me cost-latest.asc as well?

On Fri, Aug 11, 2017 at 1:10 AM, jmsigner notifications@github.com wrote:

@eduffy https://github.com/eduffy here are my input files: https://dl.dropboxusercontent.com/u/5554895/gflow.zip

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gflow/GFlow/issues/21#issuecomment-321733048, or mute the thread https://github.com/notifications/unsubscribe-auth/AA0zmd2TxWzQrE9PxQGZteP_JNDqsI0xks5sW-JIgaJpZM4Oq2D6 .

pbleonard commented 7 years ago

@jmsigner I'm apologize I misunderstood your question originally. The node_pairs flag is working but expects the node numbers be passed to that file, not the coordinates. So however you assign the node numbers when you originally created the node file. These can be cross-referenced by the log when you run the nodes file normally. Does that help?

jmsigner commented 7 years ago

@pbleonard thanks for getting back about this, would it be possible to have a brief example of the node file and the node pairs file? I am unsure, how to reference to reference nodes. Thanks

jmsigner commented 7 years ago

@eduffy sorry for the delay, in case habitat file is still of interest, here it is: https://dl.dropboxusercontent.com/u/5554895/cost-latest.zip

pbleonard commented 7 years ago

@jmsigner: Please see examples that worked successfully in my tests of your inputs. Your nodes file inputs appear to be fine as I also ran all pairs successfully - see log.

signer.zip

eduffy commented 7 years ago

@jmsigner - I was able to solve those two node-pairs. The only thing I changed were the output paths and the number of cores (you do have 28 CPUs to run this on, right?). The input files were just as you sent them. Do the directories you're trying to write already exists? gflow won't automatically create them.

Wed Aug 16 18:12:29 2017 >> (rows,cols) = (4370,4370)
Wed Aug 16 18:12:31 2017 >> Removed 0 islands (0 cells).
Wed Aug 16 18:12:31 2017 >> 4 points in nodes_all
Wed Aug 16 18:12:31 2017 >> Max distance: 400000.00 pixels
Wed Aug 16 18:12:31 2017 >> 2 pairs generated.  0 skipped.
Wed Aug 16 18:12:31 2017 >> Number of unknowns: 19096900
Wed Aug 16 18:12:47 2017 >> Solving pair 0 (1 of 2): 1[99,393] to 2[494,2002].  165.68 Km apart
Wed Aug 16 18:19:47 2017 >> R_eff = 1,2,43.991117
Wed Aug 16 18:19:47 2017 >> Estimated time remaining: 00:06:59
Wed Aug 16 18:19:47 2017 >> Solving pair 1 (2 of 2): 1[99,393] to 3[410,1122].   79.26 Km apart
Wed Aug 16 18:20:12 2017 >> Solution to iteration 0 discarded.
Wed Aug 16 18:20:13 2017 >> convergence-factor = 0.000000e+00 (0-N)
Wed Aug 16 18:25:16 2017 >> R_eff = 1,3,36.140558
Wed Aug 16 18:25:16 2017 >> Estimated time remaining: 00:00:00
Wed Aug 16 18:25:23 2017 >> Solution to iteration 0 discarded.
Wed Aug 16 18:25:24 2017 >> convergence-factor = 9.069531e-01 (1-N)
Wed Aug 16 18:25:33 2017 >> Result wildkatze/1502922324_local_sum_2.asc written.