Several .mata format files in automata-benchmark not successfully parsing

jn1z commented 1 month ago

Various files from https://github.com/VeriFIT/automata-bench have lines like: "q46 false true". It's not clear to me what this is supposed to represent, but it seems to break the MATA parser:

./build/examples/example05-parsing automata-bench/email_filter/1-19-34-37-67/gen_aut/aut34.mata libMATA error: Unknown token false

Adda0 commented 1 month ago

Hello. Thank you for the reporting the issue. This is a known bug to us that we have yet to fix. We changed the supported .mata format for false and true: from false and true to \false and \true, representing the true and false values in Boolean formulae.

The following script fix_nfa.py replaces the invalid tokens:

import re
import sys

out = []

in_parens = re.compile(r"^(.*)\(([^\(]*)\)$")
is_q = re.compile(r"^q[0-9a-zA-Z]+$")
ltrue = re.compile(r"\btrue\b")
lfalse = re.compile(r"\bfalse\b")

with open(sys.argv[1], "r") as f:
    for line in f:
        line = line.strip()
        modified = False

        line = ltrue.sub("\\\\true", line)
        line = lfalse.sub("\\\\false", line)

        if line.startswith("q") and line.endswith(")"):
            line_match = in_parens.match(line)
            if line_match:
                qs = line_match[2].split(" | ")
                if all(is_q.match(q) for q in qs):
                    modified = True
                    for q in qs:
                        out.append(line_match[1] + q)

        if not modified:
            out.append(line)

with open(sys.argv[1], "w") as f:
    for line in out:
        print(line, file=f)

And can be run like this:

for f in $(find . -type f | grep 'gen_aut.*\.mata$'); do python3 fix_nfa.py $f; done

This should hopefully fix the issue if you need this right now. Sorry for the inconvenience.

jn1z commented 1 month ago

Thank you for the quick response! That does fix the issue for me.

VeriFIT / mata

Several .mata format files in automata-benchmark not successfully parsing #441