GridOPTICS / GridPACK

https://www.gridpack.org/
40 stars 19 forks source link

Contingency analysis application fails on Polish network #191

Open bjpalmer opened 6 months ago

bjpalmer commented 6 months ago

I just tried running the contingency analysis application using the Polish network and it fails after running a few of the contingencies. I think this is because the PETSc solver is failing but the resulting error is not getting properly trapped. The powerflow solve method is suppose to trap the error and then return false so that the application can keep going, but this does not appear to be happening.

wperkins commented 6 months ago

When I run the case with 1 process, it gets through 311 tasks (or so) and then exits -- exit() is called. PETSc never gets a chance to report convergence to GridPACK, so no exception can be thrown. This is on Ubuntu with complex PETSc 3.19.4 and SuperLU_dist.

#0  __GI_exit (status=status@entry=1) at exit.c:138
#1  0x0000155552a08c08 in ztrsm_ (side=side@entry=0x1555520d4176 "R", 
    uplo=uplo@entry=0x1555520d4170 "U", transa=transa@entry=0x1555520d4172 "N", 
    diag=diag@entry=0x1555520d4172 "N", m=m@entry=0x7fffffffbef8, n=n@entry=0x7fffffffbf08, 
    alpha=0x7fffffffbf40, a=<optimized out>, lda=0x7fffffffbf0c, b=<optimized out>, 
    ldb=0x7fffffffbf04) at ztrsm.c:324
#2  0x00001555520a149b in pzgstrf2_trsm (options=options@entry=0x55555c16edf0, k0=k0@entry=273, 
    k=k@entry=366, thresh=thresh@entry=6.2290136121758314e-07, 
    Glu_persist=Glu_persist@entry=0x55555c0534c0, grid=grid@entry=0x55555c16ed28, 
    Llu=Llu@entry=0x55555c28b840, U_diag_blk_send_req=0x0, tag_ub=2147483647, stat=0x7fffffffc8b0, 
    info=0x7fffffffc880)
    at /home/d3g096/Projects/GridPakLDRD/petsc-3.19.4/ubuntu-complex-shared/externalpackages/git.superlu_dist/SRC/pzgstrf2.c:312
#3  0x000015555209ae99 in pzgstrf (options=options@entry=0x55555c16edf0, m=m@entry=5991, 
    n=n@entry=5991, anorm=anorm@entry=10.450550683841415, LUstruct=LUstruct@entry=0x55555c16eee8, 
    grid=grid@entry=0x55555c16ed28, stat=stat@entry=0x7fffffffc8b0, info=0x7fffffffc880)
    at /home/d3g096/Projects/GridPakLDRD/petsc-3.19.4/ubuntu-complex-shared/externalpackages/git.superlu_dist/SRC/pzgstrf.c:1137
#4  0x000015555207a502 in pzgssvx (options=options@entry=0x55555c16edf0, A=0x55555c16eea0, 
    ScalePermstruct=0x55555c16eec0, B=B@entry=0x0, ldb=5991, nrhs=nrhs@entry=0, 
    grid=0x55555c16ed28, LUstruct=0x55555c16eee8, SOLVEstruct=0x55555c16ef10, berr=0x0, 
    stat=0x7fffffffc8b0, info=0x7fffffffc880)
    at /home/d3g096/Projects/GridPakLDRD/petsc-3.19.4/ubuntu-complex-shared/externalpackages/git.superlu_dist/SRC/pzgssvx.c:1181
#5  0x00001555539a9569 in MatLUFactorNumeric_SuperLU_DIST (F=0x55555c2e2b40, A=0x55555e04cac0, 
    info=<optimized out>)
    at /home/d3g096/Projects/GridPakLDRD/petsc-3.19.4/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:447
#6  0x000015555347a591 in MatLUFactorNumeric (fact=0x55555c2e2b40, mat=0x55555e04cac0, 
--Type <RET> for more, q to quit, c to continue without paging--
    info=info@entry=0x555558fcd388)
    at /home/d3g096/Projects/GridPakLDRD/petsc-3.19.4/src/mat/interface/matrix.c:3243
#7  0x0000155553ee4e53 in PCSetUp_LU (pc=0x55555dd16aa0)
    at /home/d3g096/Projects/GridPakLDRD/petsc-3.19.4/src/ksp/pc/impls/factor/lu/lu.c:120
#8  0x0000155553e3fad0 in PCSetUp (pc=0x55555dd16aa0)
    at /home/d3g096/Projects/GridPakLDRD/petsc-3.19.4/src/ksp/pc/interface/precon.c:994
#9  0x0000155553f52f51 in KSPSetUp (ksp=0x55555c04cdf0)
    at /home/d3g096/Projects/GridPakLDRD/petsc-3.19.4/src/ksp/ksp/interface/itfunc.c:406
#10 0x0000155553f545ea in KSPSolve_Private (ksp=0x55555c04cdf0, b=0x555558eab140, x=<optimized out>)
    at /home/d3g096/Projects/GridPakLDRD/petsc-3.19.4/src/ksp/ksp/interface/itfunc.c:824
#11 0x0000155553f55907 in KSPSolve (ksp=<optimized out>, b=<optimized out>, x=<optimized out>)
    at /home/d3g096/Projects/GridPakLDRD/petsc-3.19.4/src/ksp/ksp/interface/itfunc.c:1070
#12 0x00001555552e306e in gridpack::math::PETScLinearSolverImplementation<double, int>::p_resolveImpl (this=0x55555dc3cb40, b=..., x=...)
    at /home/d3g096/Projects/GridPACK-Wind/src/GridPACK/src/math/petsc/petsc_vector_extractor.hpp:55
#13 0x00001555552dfeb5 in gridpack::math::PETScLinearSolverImplementation<double, int>::p_solveImpl
    (this=0x55555dc3cb40, A=..., b=..., x=...)
    at /home/d3g096/Projects/GridPACK-Wind/src/GridPACK/src/math/petsc/petsc_matrix_extractor.hpp:57
#14 0x00001555552e2078 in gridpack::math::LinearSolverImplementation<double, int>::p_solve (
    this=0x55555dc3cb40, b=..., x=...)
    at /home/d3g096/Projects/GridPACK-Wind/src/GridPACK/src/gridpack/math/vector_interface.hpp:383
#15 0x000015555543e7c8 in gridpack::math::BaseLinearSolverInterface<double, int>::solve (x=..., 
    b=..., this=0x7fffffffcfe0)
    at /home/d3g096/Projects/GridPACK-Wind/src/GridPACK/src/gridpack/math/linear_solver_interface.hpp:116
#16 gridpack::powerflow::PFAppModule::solve (this=this@entry=0x7fffffffdb80)
    at /home/d3g096/Projects/GridPACK-Wind/src/GridPACK/src/applications/modules/powerflow/pf_app_module.cpp:419
#17 0x000055555556b711 in gridpack::contingency_analysis::CADriver::execute (this=<optimized out>, 
--Type <RET> for more, q to quit, c to continue without paging--
    argc=<optimized out>, argv=<optimized out>)
    at /home/d3g096/Projects/GridPACK-Wind/src/GridPACK/src/applications/contingency_analysis/ca_driver.cpp:501
#18 0x0000555555567377 in main (argc=<optimized out>, argv=<optimized out>)
    at /home/d3g096/Projects/GridPACK-Wind/src/GridPACK/src/applications/contingency_analysis/ca_main.cpp:38
abhyshr commented 6 months ago

Hi Bruce and Bill, who wrote the contingency analysis application and when was it last run or tested?

wperkins commented 6 months ago

Hi Bruce and Bill, who wrote the contingency analysis application and when was it last run or tested?

The unit test runs on the 14-bus case.

bjpalmer commented 6 months ago

I wrote most of the contingency analysis application. I can't remember when the last time anyone was able to run the Polish or European network test cases. I think it has always been true that some of the contingencies have solver failures (beyond failing to converge) so this seems like it may be a new problem.

bjpalmer commented 6 months ago

I saw something similar. I also tried running with the KLU solver and I think the code just crashes from somewhere inside PETSc (I'm getting a PETSC stack trace). It looks like the exit is coming from somewhere inside SuperLU. Is there some way to keep that from happening and just return an error to the calling program?

wperkins commented 6 months ago

The Polish case runs to completion for me with 1 process using real PETSc 3.19.4 and SuperLU_dist.

wperkins commented 6 months ago

I saw something similar. I also tried running with the KLU solver and I think the code just crashes from somewhere inside PETSc (I'm getting a PETSC stack trace). It looks like the exit is coming from somewhere inside SuperLU. Is there some way to keep that from happening and just return an error to the calling program?

I don't see how without modifying SuperLU_dist.

bjpalmer commented 6 months ago

Seems like an oversight on SuperLU to do that. If it fails it should hand control back to the user to figure out how to deal with it.

The Polish case runs to completion for me with 1 process using real PETSc 3.19.4 and SuperLU_dist.

I'm using complex PETSc 3.16.3 and shared libraries. I'll see what happens with reals.

wperkins commented 6 months ago

The Polish case runs to completion for me with 1 process using real PETSc 3.19.4 and SuperLU_dist.

Seems really slow with more processors, though.

bjpalmer commented 6 months ago

Seems really slow with more processors, though.

Are you using the two-sided runtime?

wperkins commented 6 months ago

Seems really slow with more processors, though.

Are you using the two-sided runtime?

Yes. But, it runs fine using 8 processes with complex MUMPS.

bjpalmer commented 6 months ago

How about with progress ranks?

wperkins commented 6 months ago

How about with progress ranks?

I haven't tried. I generally don't build GA that way.

wperkins commented 6 months ago

I notice none of these input files set <PETScPrefix> for the LinearSolver

Seems really slow with more processors, though.

Are you using the two-sided runtime?

Yes. But, it runs fine using 8 processes with complex MUMPS.

Also seems fine with using 8 processes and the complex PETSc LU solver. This is really looking like a SuperLU_dist problem to me.

wperkins commented 6 months ago

Seems fine with plain SuperLU (not dist) as well.

bjpalmer commented 6 months ago

I ran with just SuperLU and it works for me too. I also tried it on the European open model and it works for that one too. We should change the input files for this calculation (and the european network) and call it good.