Closed jmsigner closed 7 years ago
MPI_ABORT was invoked on rank 31 in communicator MPI_COMM_WORLD with errorcode 59.
[30]PETSC ERROR: ------------------------------------------------------------------------ [30]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range [30]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger [30]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind [30]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors [30]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run [30]PETSC ERROR: to get more information on the crash. [30]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [30]PETSC ERROR: Signal received [30]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [30]PETSC ERROR: Petsc Release Version 3.7.5, Jan, 01, 2017 [30]PETSC ERROR: ./gflow.x on a x86_64-linux-gnu-real named rangifer by rangifer Wed Aug 2 04:49:47 2017 [30]PETSC ERROR: Configure options --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --with-silent-rules=0 --libdir=${prefix}/lib/x86_64-linux-gnu --libexecdir=${prefix}/lib/x86_64-linux-gnu --with-maintainer-mode=0 --with-dependency-tracking=0 --with-debugging=0 --shared-library-extension=_real --with-clanguage=C++ --with-shared-libraries --with-pic=1 --useThreads=0 --with-fortran-interfaces=1 --with-mpi-dir=/usr/lib/x86_64-linux-gnu/openmpi --with-blas-lib=-lblas --with-lapack-lib=-llapack --with-blacs=1 --with-blacs-lib="-lblacsCinit-openmpi -lblacs-openmpi" --with-scalapack=1 --with-scalapack-lib=-lscalapack-openmpi --with-mumps=1 --with-mumps-include="[]" --with-mumps-lib="-ldmumps -lzmumps -lsmumps -lcmumps -lmumps_common -lpord" --with-suitesparse=1 --with-suitesparse-include=/usr/include/suitesparse --with-suitesparse-lib="-lumfpack -lamd -lcholmod -lklu" --with-spooles=1 --with-spooles-include=/usr/include/spooles --with-spooles-lib=-lspooles --with-ptscotch=1 --with-ptscotch-include=/usr/include/scotch --with-ptscotch-lib="-lptesmumps -lptscotch -lptscotcherr" --with-fftw=1 --with-fftw-include="[]" --with-fftw-lib="-lfftw3 -lfftw3_mpi" --with-superlu=1 --with-superlu-include=/usr/include/superlu --with-superlu-lib=-lsuperlu --with-hdf5=1 --with-hdf5-dir=/usr/lib/x86_64-linux-gnu/hdf5/openmpi --CXX_LINKER_FLAGS=-Wl,--no-as-needed --with-hypre=1 --with-hypre-include=/usr/include/hypre --with-hypre-lib="-lHYPRE_IJ_mv -lHYPRE_parcsr_ls -lHYPRE_sstruct_ls -lHYPRE_sstruct_mv -lHYPRE_struct_ls -lHYPRE_struct_mv -lHYPRE_utilities" --prefix=/usr/lib/petscdir/3.7.5/x86_64-linux-gnu-real PETSC_DIR=/build/petsc-fcyWHu/petsc-3.7.5+dfsg1 --PETSC_ARCH=x86_64-linux-gnu-real CFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-fcyWHu/petsc-3.7.5+dfsg1=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC" CXXFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-fcyWHu/petsc-3.7.5+dfsg1=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC" FCFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-fcyWHu/petsc-3.7.5+dfsg1=. -fstack-protector-strong -fPIC" FFLAGS="-g -O2 -fdebug-prefix-map=/build/petsc-fcyWHu/petsc-3.7.5+dfsg1=. -fstack-protector-strong -fPIC" CPPFLAGS="-Wdate-time -D_FORTIFY_SOURCE=2" LDFLAGS="-Wl,-Bsymbolic-functions -Wl,-z,relro -fPIC" MAKEFLAGS=w [30]PETSC ERROR: #1 User provided function() line 0 in unknown file Wed Aug 2 04:50:24 2017 >> Signal 15 recieved. [rangifer:23732] 1 more process has sent help message help-mpi-api.txt / mpi-abort [rangifer:23732] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messag
@jmsigner @xgirouxb can you please send example node input file with submit script so I can have a look?
@jmsigner I was able to successfully reproduce your error with the node_pairs
flag. As of right now this flag should not be used. However, you should still be able to calculate your problem -- So simply place your list of nodes to calculate behind the nodes
flag instead. You can still turn on/off the shuffle_node_pairs
flag if you wish. So based on the examples you sent just try: -nodes node_pair_file
. Let me know if you have additional questions.
@pbleonard thanks for your quick reply. When I use -nodes node_pair_file
, where would I provide the actual coordinates of the nodes? When I just use the above, the node pairs are taken as coordinates. E.g., from the output Thu Aug 3 18:21:45 2017 >> Solving pair 0 (1 of 1): 1[0,1] to 2[0,2]. 0.10 Km apart
, the pairs are interpreted as coordinates, right?
@jmsigner All nodes should be input into gflow as coordinates relative to the resistance grid. If you're having problems with inputs you can also simply input an .asc grid of nodes and it will translate to relative coordinates for you.
Here is a link to the inputs and script that cause the error: https://my.pcloud.com/publink/show?code=XZMqDcZid5b0btkeNJpkgITRLSMMByvJpdk
@eduffy Should the flag node_pairs
be deprecated? or is there a bug? It doesn't appear to work in conjunction with or without nodes
flag
@jmsigner - What other flags did you pass to gflow.x
? Were you writing the current denisty map at every iteration, or saving the effective resistance? You said the program crashed at the end, is that the first time any output (other than the terminal) was written?
@pbleonard - I don't think it should. nodes
is required; it defines the focal points. After that, you may provide an explicit list of node pairs to solve via the node_pairs
flag. It's not an ideal scenario, but this would be useful for using gflow to solve every pair in a high-throughput environment and you want to make sure there's no duplication of effort / wasted computing time.
Can you send me jmsigner's scripts?
@eduffy here are my input files: https://dl.dropboxusercontent.com/u/5554895/gflow.zip
@jmsigner Can you send me cost-latest.asc as well?
On Fri, Aug 11, 2017 at 1:10 AM, jmsigner notifications@github.com wrote:
@eduffy https://github.com/eduffy here are my input files: https://dl.dropboxusercontent.com/u/5554895/gflow.zip
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gflow/GFlow/issues/21#issuecomment-321733048, or mute the thread https://github.com/notifications/unsubscribe-auth/AA0zmd2TxWzQrE9PxQGZteP_JNDqsI0xks5sW-JIgaJpZM4Oq2D6 .
@jmsigner I'm apologize I misunderstood your question originally. The node_pairs
flag is working but expects the node numbers be passed to that file, not the coordinates. So however you assign the node numbers when you originally created the node file. These can be cross-referenced by the log when you run the nodes file normally. Does that help?
@pbleonard thanks for getting back about this, would it be possible to have a brief example of the node file and the node pairs file? I am unsure, how to reference to reference nodes. Thanks
@eduffy sorry for the delay, in case habitat file is still of interest, here it is: https://dl.dropboxusercontent.com/u/5554895/cost-latest.zip
@jmsigner: Please see examples that worked successfully in my tests of your inputs. Your nodes file inputs appear to be fine as I also ran all pairs successfully - see log.
@jmsigner - I was able to solve those two node-pairs. The only thing I changed were the output paths and the number of cores (you do have 28 CPUs to run this on, right?). The input files were just as you sent them. Do the directories you're trying to write already exists? gflow won't automatically create them.
Wed Aug 16 18:12:29 2017 >> (rows,cols) = (4370,4370)
Wed Aug 16 18:12:31 2017 >> Removed 0 islands (0 cells).
Wed Aug 16 18:12:31 2017 >> 4 points in nodes_all
Wed Aug 16 18:12:31 2017 >> Max distance: 400000.00 pixels
Wed Aug 16 18:12:31 2017 >> 2 pairs generated. 0 skipped.
Wed Aug 16 18:12:31 2017 >> Number of unknowns: 19096900
Wed Aug 16 18:12:47 2017 >> Solving pair 0 (1 of 2): 1[99,393] to 2[494,2002]. 165.68 Km apart
Wed Aug 16 18:19:47 2017 >> R_eff = 1,2,43.991117
Wed Aug 16 18:19:47 2017 >> Estimated time remaining: 00:06:59
Wed Aug 16 18:19:47 2017 >> Solving pair 1 (2 of 2): 1[99,393] to 3[410,1122]. 79.26 Km apart
Wed Aug 16 18:20:12 2017 >> Solution to iteration 0 discarded.
Wed Aug 16 18:20:13 2017 >> convergence-factor = 0.000000e+00 (0-N)
Wed Aug 16 18:25:16 2017 >> R_eff = 1,3,36.140558
Wed Aug 16 18:25:16 2017 >> Estimated time remaining: 00:00:00
Wed Aug 16 18:25:23 2017 >> Solution to iteration 0 discarded.
Wed Aug 16 18:25:24 2017 >> convergence-factor = 9.069531e-01 (1-N)
Wed Aug 16 18:25:33 2017 >> Result wildkatze/1502922324_local_sum_2.asc written.
When using the argument
node_pairs
, recognizes the file, runs each pair, but throws an error when finishing up. The error I get is: