Closed mattsignorelli closed 7 months ago
I have updated my comment on mul
with the more specific finding. Still tracking down the seg fault...
About the seg fault, have you checked the mad_[c]tpsa_nam function? setnam has been replaced by new nam which is doing both.
Yes I have checked that. Something seems to be happening silently, because the last two bugs on the checklist only occur sometimes, not every time
Nevermind, those are happening every time it seems, I mixed myself up
One point that has changed in the internal semantic is that lo bound doesn't include non-zero scalar part anymore, because when manipulating high-order specific maps around an orbit, all the intermediate orders were processed while filled with zeros. The drawback of this speedup is that coef[0] must always be treated separately (internally).
I see, that makes sense.
I'm finding the seg fault to appear in cot
, however this one seems to only occur when I run all tests before it. When only running the lone cot
test, everything seems okay
Here is the specific output:
[43550] signal (11.1): Segmentation fault
in expression starting at REPL[1]:1
mad_tpsa_inv at /home/matt/.julia/dev/GTPSA_jll/override/lib/libgtpsa.so (unknown line)
mad_tpsa_div at /home/matt/.julia/dev/GTPSA_jll/override/lib/libgtpsa.so (unknown line)
mad_tpsa_cot at /home/matt/.julia/dev/GTPSA_jll/override/lib/libgtpsa.so (unknown line)
libgtpsa.so is the compiled C library
mad_tpsa_inv is calling mad_tpsa_scl if v != 1, so before the previous fix, the scalar part was removed. However, I don't see how this would trigger a seg fault...
Getting closer... something about calls to polar
beforehand is related, other functions may be related too, still investigating
Getting closer... something about calls to
polar
beforehand is related, other functions may be related too, still investigating
I don't think the problem is coming from a high-level function (mad_tpsa_fun) where I didn't do anything because these functions rely on lower-level functions. I suspect more from some corruption of lo, hi or nz beforehand. This is what I observed with DEBUG=2, one of the mul is corrupted during tracking e.g. the LHC. [...] -> mad_tpsa_mul:442: -> mad_tpsa_update:281: mad_tpsa_update:287: 't' { lo=64 hi=255 mo=4 uid=0, did=1 nz=00000 ** bug @ o=0 i=-1 }
adjust0 was broken for nz==0 and coef[0]!=0, fixed in 231a8521
Ok, perhaps it is one of the low level functions that both polar
and cot
call within inv
If polar
is called three times with tpsa t = 1+x, I get this:
[75324] signal (11.1): Segmentation fault
in expression starting at REPL[6]:1
mad_tpsa_inv at /home/matt/.julia/dev/GTPSA_jll/override/lib/libgtpsa.so (unknown line)
mad_tpsa_div at /home/matt/.julia/dev/GTPSA_jll/override/lib/libgtpsa.so (unknown line)
mad_tpsa_atan2 at /home/matt/.julia/dev/GTPSA_jll/override/lib/libgtpsa.so (unknown line)
mad_ctpsa_polar at /home/matt/.julia/dev/GTPSA_jll/override/lib/libgtpsa.so (unknown line)
this is the minimal working example I found. This has to be done with the same tpsa each call
moving to the next debug step, the cdamap used in normal forms after tracking through the HL-LHC gets incorrect input values for minv somehow (with DEBUG=2, no corrupted TPSA is detected):
-> mad_ctpsa_minv:112:
error: mad_tpsa_minv.c:118: : invalid rank-deficient map (1st order has zero row)
../mad:
stack traceback:
[C]: in function 'mad_ctpsa_minv'
madl_damap.mad:594: in function '__pow'
madl_gphys.mad:1322: in function 'normal'
madl_twiss.mad:413: in function 'twiss_nform'
madl_twiss.mad:563: in function 'make_mflow'
madl_twiss.mad:585: in function 'twiss'
Ok, perhaps it is one of the low level functions that both
polar
andcot
call within invIf
polar
is called three times with tpsa t = 1+x, I get this:[75324] signal (11.1): Segmentation fault in expression starting at REPL[6]:1 mad_tpsa_inv at /home/matt/.julia/dev/GTPSA_jll/override/lib/libgtpsa.so (unknown line) mad_tpsa_div at /home/matt/.julia/dev/GTPSA_jll/override/lib/libgtpsa.so (unknown line) mad_tpsa_atan2 at /home/matt/.julia/dev/GTPSA_jll/override/lib/libgtpsa.so (unknown line) mad_ctpsa_polar at /home/matt/.julia/dev/GTPSA_jll/override/lib/libgtpsa.so (unknown line)
this is the minimal working example I found. This has to be done with the same tpsa each call
I cannot reproduce the problem, even with 20 calls... Here is the sequence of calls for a single polar, which works only with ctpsa, not tpsa.
> t:print()
-> mad_ctpsa_print:391:
-UNNAMED-: C, NV = 6, MO = 1
******************************************************************************
I COEFFICIENT ORDER EXPONENTS
1 1.0000000000000000E+00 +0.0000000000000000E+00i 0 0 0 0 0 0 0
<- mad_ctpsa_print:436:
> MAD.gmath.polar(t,t)
-> mad_ctpsa_polar:139:
-> mad_tpsa_new:191:
-> mad_tpsa_init:169:
<- mad_tpsa_init:172:
<- mad_tpsa_new:197:
-> mad_tpsa_new:191:
-> mad_tpsa_init:169:
<- mad_tpsa_init:172:
<- mad_tpsa_new:197:
-> mad_tpsa_new:191:
-> mad_tpsa_init:169:
<- mad_tpsa_init:172:
<- mad_tpsa_new:197:
-> mad_ctpsa_real:27:
-> mad_tpsa_setval:273:
<- mad_tpsa_setval:275:
<- mad_ctpsa_real:35:
-> mad_ctpsa_imag:50:
-> mad_tpsa_setval:273:
<- mad_tpsa_setval:275:
<- mad_ctpsa_imag:58:
-> mad_tpsa_hypot:786:
-> mad_tpsa_axypbvwpc:878:
-> mad_tpsa_new:191:
-> mad_tpsa_init:169:
<- mad_tpsa_init:172:
<- mad_tpsa_new:197:
-> mad_tpsa_mul:442:
-> mad_tpsa_setval:273:
<- mad_tpsa_setval:275:
<- mad_tpsa_mul:505:
-> mad_tpsa_mul:442:
-> mad_tpsa_setval:273:
<- mad_tpsa_setval:275:
<- mad_tpsa_mul:505:
-> mad_tpsa_axpbypc:822:
-> mad_tpsa_setval:273:
<- mad_tpsa_setval:275:
<- mad_tpsa_axpbypc:830:
-> mad_tpsa_del:203:
<- mad_tpsa_del:205:
<- mad_tpsa_axypbvwpc:886:
-> mad_tpsa_sqrt:212:
-> mad_tpsa_setval:273:
<- mad_tpsa_setval:275:
<- mad_tpsa_sqrt:219:
<- mad_tpsa_hypot:792:
-> mad_tpsa_atan2:592:
-> mad_tpsa_div:511:
-> mad_tpsa_scl:267:
-> mad_tpsa_setval:273:
<- mad_tpsa_setval:275:
<- mad_tpsa_scl:277:
<- mad_tpsa_div:523:
-> mad_tpsa_atan:776:
-> mad_tpsa_setval:273:
<- mad_tpsa_setval:275:
<- mad_tpsa_atan:781:
<- mad_tpsa_atan2:609:
-> mad_ctpsa_cplx:73:
-> mad_ctpsa_setval:273:
<- mad_ctpsa_setval:275:
<- mad_ctpsa_cplx:82:
-> mad_tpsa_del:203:
<- mad_tpsa_del:205:
-> mad_tpsa_del:203:
<- mad_tpsa_del:205:
<- mad_ctpsa_polar:147:
> t:print()
-> mad_ctpsa_print:391:
-UNNAMED-: C, NV = 6, MO = 1
******************************************************************************
I COEFFICIENT ORDER EXPONENTS
1 1.0000000000000000E+00 +0.0000000000000000E+00i 0 0 0 0 0 0 0
<- mad_ctpsa_print:436:
Ok I found one problem: I am compiling with DESC_USE_TMP = 1, and I see that after a call to polar, ti
in the descriptor is 1. Another call to polar and it is 2. Then segfault on the third call. So one of the temporaries used internally by polar
(or one of its internal calls) is not being released atleast with DESC_USE_TMP=1
If the same ctpsa is used consecutively, then it seg faults on the third call. If other ctpsa's are called inbetween, it seg faults on the 6th call. This happens in tests where I am not using temporaries, so the handling is internal in the polar
call
Ok, perhaps it is one of the low level functions that both
polar
andcot
call within inv Ifpolar
is called three times with tpsa t = 1+x, I get this:[75324] signal (11.1): Segmentation fault in expression starting at REPL[6]:1 mad_tpsa_inv at /home/matt/.julia/dev/GTPSA_jll/override/lib/libgtpsa.so (unknown line) mad_tpsa_div at /home/matt/.julia/dev/GTPSA_jll/override/lib/libgtpsa.so (unknown line) mad_tpsa_atan2 at /home/matt/.julia/dev/GTPSA_jll/override/lib/libgtpsa.so (unknown line) mad_ctpsa_polar at /home/matt/.julia/dev/GTPSA_jll/override/lib/libgtpsa.so (unknown line)
this is the minimal working example I found. This has to be done with the same tpsa each call
I cannot reproduce the problem, even with 20 calls... Here is the sequence of calls for a single polar, which works only with ctpsa, not tpsa.
I will try again now compiling with DESC_USE_TMP=0
All my tests with allocated tpsas pass with the latest dev-tpsa-new for DESC_USE_TMP = 0
With DEBUG=2 and DESC_USE_TMP = 1, calling polar
just once aborts with the following output:
-> mad_ctpsa_polar:139:
-> mad_ctpsa_real:27:
<- mad_ctpsa_real:44:
-> mad_ctpsa_imag:50:
<- mad_ctpsa_imag:67:
-> mad_tpsa_hypot:786:
-> mad_tpsa_axypbvwpc:878:
-> mad_tpsa_mul:442:
-> mad_tpsa_setval:277:
<- mad_tpsa_setval:279:
<- mad_tpsa_mul:505:
-> mad_tpsa_mul:442:
-> mad_tpsa_update:285:
<- mad_tpsa_update:291:
<- mad_tpsa_mul:505:
-> mad_tpsa_axpbypc:822:
<- mad_tpsa_axpbypc:844:
<- mad_tpsa_axypbvwpc:886:
-> mad_tpsa_sqrt:212:
-> mad_tpsa_copy:299:
<- mad_tpsa_copy:316:
-> mad_tpsa_scl:267:
<- mad_tpsa_scl:282:
-> mad_tpsa_mul:442:
-> mad_tpsa_update:285:
<- mad_tpsa_update:291:
<- mad_tpsa_mul:505:
-> mad_tpsa_acc:288:
<- mad_tpsa_acc:311:
-> mad_tpsa_mul:442:
-> mad_tpsa_update:285:
<- mad_tpsa_update:291:
<- mad_tpsa_mul:505:
-> mad_tpsa_acc:288:
<- mad_tpsa_acc:311:
-> mad_tpsa_mul:442:
-> mad_tpsa_update:285:
<- mad_tpsa_update:291:
<- mad_tpsa_mul:505:
-> mad_tpsa_acc:288:
<- mad_tpsa_acc:311:
-> mad_tpsa_mul:442:
-> mad_tpsa_update:285:
<- mad_tpsa_update:291:
<- mad_tpsa_mul:505:
-> mad_tpsa_acc:288:
<- mad_tpsa_acc:311:
<- mad_tpsa_sqrt:233:
<- mad_tpsa_hypot:792:
-> mad_tpsa_atan2:592:
-> mad_tpsa_div:511:
-> mad_tpsa_inv:158:
-> mad_tpsa_copy:299:
<- mad_tpsa_copy:316:
-> mad_tpsa_scl:267:
<- mad_tpsa_scl:282:
-> mad_tpsa_mul:442:
-> mad_tpsa_update:285:
<- mad_tpsa_update:291:
<- mad_tpsa_mul:505:
-> mad_tpsa_acc:288:
<- mad_tpsa_acc:311:
-> mad_tpsa_mul:442:
-> mad_tpsa_update:285:
<- mad_tpsa_update:291:
<- mad_tpsa_mul:505:
-> mad_tpsa_acc:288:
<- mad_tpsa_acc:311:
-> mad_tpsa_mul:442:
-> mad_tpsa_update:285:
<- mad_tpsa_update:291:
<- mad_tpsa_mul:505:
-> mad_tpsa_acc:288:
<- mad_tpsa_acc:311:
-> mad_tpsa_mul:442:
-> mad_tpsa_update:285:
<- mad_tpsa_update:291:
<- mad_tpsa_mul:505:
-> mad_tpsa_acc:288:
<- mad_tpsa_acc:311:
<- mad_tpsa_inv:178:
-> mad_tpsa_mul:442:
-> mad_tpsa_setval:277:
<- mad_tpsa_setval:279:
-> mad_tpsa_copy:299:
<- mad_tpsa_copy:316:
<- mad_tpsa_mul:505:
<- mad_tpsa_div:531:
-> mad_tpsa_atan:776:
-> mad_tpsa_setval:277:
<- mad_tpsa_setval:279:
<- mad_tpsa_atan:781:
<- mad_tpsa_atan2:609:
-> mad_ctpsa_cplx:73:
<- mad_ctpsa_cplx:95:
julia: /home/matt/tpsa/gtpsa/code/mad_tpsa_impl.h:202: mad_tpsa_reltmp: Assertion `d->t[ tid*DESC_MAX_TMP + d->ti[tid]-1 ] == tmp' failed.
[159510] signal (6.-6): Aborted
in expression starting at REPL[5]:1
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7f3414c1871a)
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
mad_tpsa_reltmp at /home/matt/.julia/dev/GTPSA_jll/override/lib/libgtpsa.so (unknown line)
mad_ctpsa_polar at /home/matt/.julia/dev/GTPSA_jll/override/lib/libgtpsa.so (unknown line)
mad_ctpsa_polar! at /home/matt/.julia/dev/GTPSA/src/low_level/ctpsa.jl:556 [inlined]
Ok I found the problem and submitted a PR. One temporary in polar wasn't being released
All my tests pass now, I do not have tests yet for any of the map methods.
I close this issue for now.
Running list of issues I find with dev-tpsa-new:
mul
prints a new line to stdout for each call when only the scalar part is set or is 0 (no 1st, 2nd, 3rd, etc order terms)div
removes scalar part (e.g. for tpsa a = 1+x and tpsa b = 2, a/b gives 0.5x)With DESC_USE_TMP=0, all my tests with allocated tpsas pass. Making the fix in #434 , all of my tests with DESC_USE_TMP=1 using both the temporaries and allocated tpsas pass