MethodicalAcceleratorDesign / MAD-X

MAD-X repository
Other
49 stars 39 forks source link

Possible to cause SEGFAULT in SEQUENCE #992

Open kyrsjo opened 3 years ago

kyrsjo commented 3 years ago

A student ( @jonashei ) in our general accelerator physics course found a way to crash MAD-X with segfault when defining a SEQUENCE. This is a shortened version of the script:

test1 : SEQUENCE, L=1;
Q1 : MULTIPOLE, at=0;
ENDSEQUENCE;

test2comb : SEQUENCE, L=2;
test1, at=1, from=Q1;
ENDSEQUENCE;

BEAM, PARTICLE=proton, ENERGY=7000;
USE, SEQUENCE=test2comb;

The crash happened on version 5.06.01; the original version of the script crashed MAD-X on both MacOS and Linux, and the shortened version crashed it on Linux (MacOS was not tried).

The expected outcome would be to have an error message when encountering the bad input, and possibly a "clean" crash instead of a segfault.

tpersson commented 3 years ago

I agree that a segmentation fault is not the best. It is not the most common error I think people cause when using MAD-X but I agree that it could happen so there should be an error forbidding this referencing. I will fix it but not sure if it will be for this release..

ldeniau commented 3 years ago

I would like to add that there are many ways to crash MAD-X because it was developed with the philosophy "work as expected", meaning no guards for the wrong usages. MAD-X is definitely not fool-proof. With time we tried to enforce more checks but it's very difficult when it was not planned from the beginning. Hence crashing MAD-X is not necessarily an issue when the script is doing something explicitly weird. I agree that it's not the best approach for complex code and require a lot of discipline from users to avoid "weird" usages...

kyrsjo commented 3 years ago

I had a quick look at it, and it seems that it fails when dealing with the namelist node_list, where name[1] gets switched from test1:1 to q1:1 and back again in an infinite recursion. A backtrace in the function where the crash finally occurs (at least on the newest master, on my machine) confirms this:

(gdb) bt
#0  name_list_pos (name=name@entry=0x7fffffffcd50 "q1:1", vlist=0x7ffff46493c0) at /home/kyrsjo/code/MAD-X/src/mad_name.c:179
#1  0x000000000041909b in hidden_node_pos (name=0x7ffff45531b0 "q1", sequ=sequ@entry=0x7ffff46ccb40) at /home/kyrsjo/code/MAD-X/src/mad_node.c:445
#2  0x0000000000419206 in get_node_pos (node=node@entry=0x7ffff4652540, sequ=sequ@entry=0x7ffff46ccb40) at /home/kyrsjo/code/MAD-X/src/mad_node.c:274
#3  0x0000000000419149 in hidden_node_pos (name=0x7ffff45531b0 "q1", sequ=sequ@entry=0x7ffff46ccb40) at /home/kyrsjo/code/MAD-X/src/mad_node.c:453
#4  0x0000000000419206 in get_node_pos (node=node@entry=0x7ffff4652540, sequ=sequ@entry=0x7ffff46ccb40) at /home/kyrsjo/code/MAD-X/src/mad_node.c:274
#5  0x0000000000419149 in hidden_node_pos (name=0x7ffff45531b0 "q1", sequ=sequ@entry=0x7ffff46ccb40) at /home/kyrsjo/code/MAD-X/src/mad_node.c:453
#6  0x0000000000419206 in get_node_pos (node=node@entry=0x7ffff4652540, sequ=sequ@entry=0x7ffff46ccb40) at /home/kyrsjo/code/MAD-X/src/mad_node.c:274
#7  0x0000000000419149 in hidden_node_pos (name=0x7ffff45531b0 "q1", sequ=sequ@entry=0x7ffff46ccb40) at /home/kyrsjo/code/MAD-X/src/mad_node.c:453
#8  0x0000000000419206 in get_node_pos (node=node@entry=0x7ffff4652540, sequ=sequ@entry=0x7ffff46ccb40) at /home/kyrsjo/code/MAD-X/src/mad_node.c:274
#9  0x0000000000419149 in hidden_node_pos (name=0x7ffff45531b0 "q1", sequ=sequ@entry=0x7ffff46ccb40) at /home/kyrsjo/code/MAD-X/src/mad_node.c:453
#10 0x0000000000419206 in get_node_pos (node=node@entry=0x7ffff4652540, sequ=sequ@entry=0x7ffff46ccb40) at /home/kyrsjo/code/MAD-X/src/mad_node.c:274
#11 0x0000000000420cd7 in all_node_pos (sequ=0x7ffff46ccb40) at /home/kyrsjo/code/MAD-X/src/mad_seq.c:148
#12 0x00000000004218dd in expand_curr_sequ (flag=flag@entry=0) at /home/kyrsjo/code/MAD-X/src/mad_seq.c:2303
#13 0x0000000000424714 in use_sequ (cmd=cmd@entry=0x7ffff47b1230) at /home/kyrsjo/code/MAD-X/src/mad_seq.c:1763
#14 0x00000000004b29a0 in control (cmd=0x7ffff47b1230) at /home/kyrsjo/code/MAD-X/src/mad_cmd.c:114
#15 exec_command () at /home/kyrsjo/code/MAD-X/src/mad_cmd.c:185
#16 0x000000000040e4f5 in process () at /home/kyrsjo/code/MAD-X/src/mad_eval.c:385
#17 0x0000000000410372 in pro_input_ (statement=0x7ffff4565000 "use, sequence=test2comb") at /home/kyrsjo/code/MAD-X/src/mad_eval.c:358
#18 0x0000000000409cc8 in madx_input (top=top@entry=0) at /home/kyrsjo/code/MAD-X/src/mad_core.c:201
#19 0x0000000000408b90 in mad_run () at /home/kyrsjo/code/MAD-X/src/mad_main.c:85
#20 main (argc=<optimized out>, argv=<optimized out>) at /home/kyrsjo/code/MAD-X/src/mad_main.c:52

Eventually the recursion gets too deep, and it fails with a stack overflow; for me this happens in name_list_pos (mad_name.c:187), which reads int mid = (low + high) / 2;. A backtrace after failure reveals a very long cycle of get_node_pos -> hidden_node_pos. This basically means that the maximum level of recursion is undefined (but in the many 10000s, judging from the depth of the backtrace, which is 80'593 calls deep).

A solution could be something like counting the level of recursion (with the counter set to 0 from the explicitly USE'd sequence), and exit with an error if the depth of sub-sequencing becomes greater than some arbitrary assumed-safe limit.

In general I try to always report/fix segfaults, even when caused by "malicious" user input, since they can often also be triggered in other ways that are not as easy to debug, and since they may not even cause the program to crash but can cause slightly but significantly incorrect output. It's not so easy to know if a given sefault is one that could cause something like this without actually diagnosing it - and the diagnosis is usually most of the work of the fix so even if it's a "safe" segfault (like this one seems to be) its better to just fix it than to leave it to be discovered and re-diagnosed at a later point in time.