VeriFIT / mata

A fast and simple automata library
MIT License
23 stars 13 forks source link

Segmentation fault when parsing regexes #437

Open adastepkova opened 2 months ago

adastepkova commented 2 months ago

I am using the python binding, libmata version 1.6.9.

When running the following code, the program ends in segmentation fault.

from libmata import parser
nfa = parser.from_regex("^.*[sS][yY][sS][tT][eE][mM][pP][aA][tT][hH]\\=([hH][tT]{2}[pP][sS]?)|([fF][tT][pP])")

The same happens on all following regexes.

^.*[sS][yY][sS][tT][eE][mM][pP][aA][tT][hH]\=([hH][tT]{2}[pP][sS]?)|([fF][tT][pP])
^[^\f\n\r\t\v]{65}|[^\f\n\r\t\v]+[\f\n\r\t\v]+[^\f\n\r\t\v]{65}|[^\f\n\r\t\v]+[\f\n\r\t\v]+[^\f\n\r\t\v]+[\f\n\r\t\v]+[^\f\n\r\t\v]{65}
^get (X.downloadX[ -~]*|X.supernode[ -~]|X.status[ -~]|X.network[ -~]*|X.files|X.hash\=[0-9a-f]*X[ -~]*) httpX1.1|user-agent: kazaa|x-kazaa(-username|-network|-ip|-supernodeip|-xferid|-xferuid|tag)|^give [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]?[0-9]?[0-9]?
^(\*[\x01\x02].*\x03\x0b|\*\x01.?.?.?.?\x01)|flapon|toc_signon.*0x
^(\x11\x20\x01...?\x11|\xfe\xfd.?.?.?.?.?.?(\x14\x01\x06|\xff\xff\xff))|[\]\x01].?battlefield2
^(\x13bittorrent protocol|azver\x01$|get Xscrape\?info_hash\=get Xannounce\?info_hash\=|get XclientXbitcometX|GET Xdata\?fid\=)|d1:ad2:id20:|\x08'7P\)[RP]
^[a-z][a-z0-9\-_]+|login: [\x09-\x0d -~]* name: [\x09-\x0d -~]* Directory:
^get X.*icy-metadata:1|icy [1-5][0-9][0-9] [\x09-\x0d -~]*(content-type:audio|icy-)
^([()]|get)(...?.?.?(reg|get|query)|.+User-Agent: (MozillaX4\.0 \(compatible; (MSIE 6\.0; Windows NT 5\.1;? ?\)|MSIE 5\.00; Windows 98\))))|Keep-Alive\x0d\x0a\x0d\x0a[26]
^[\f\n\r\t\v]*Accept-Language[\f\n\r\t\v]*|3a|[\f\n\r\t\v]*([^\r\n]*?\x2c){20}
Adda0 commented 2 months ago

Hello. Thank you for the reports and reproducible examples. I will investigate all of these and hopefully the fixes should be easy as I seem to remember that these things used to work normally.

koniksedy commented 3 weeks ago

Interestingly, all of these regular expressions result in an automaton with an empty language.

This problem is discussed in issue #450.

Adda0 commented 2 weeks ago

The segfault has been fixed in #451, but there is still the aforementioned issue from #450. This issue will be closed when #450 is resolved.