Aunsiels / pyformlang

A python library to manipulate formal languages and various automata
https://pypi.org/project/pyformlang/
MIT License
43 stars 10 forks source link

pyformlang.regular_expression.regex_objects.MisformedRegexError: Wrong parenthesis regex Regex #12

Closed vadyushkins closed 2 years ago

vadyushkins commented 3 years ago

Hi, @Aunsiels!

Thank you for the awesome library! :clap:

Unfortunately, while using your library, I got the error :bug: mentioned in the title. :disappointed:

Environment

How to reproduce

from pyformlang.cfg import CFG
from pyformlang.rsa import RecursiveAutomaton as RSA

rsa = RSA.from_cfg(CFG.from_text("S -> ((a|((b|c))))"))

Possible reason

I think this is due to the fact that instead of a regular expression on the right side, CFG breaks it down into products(using |) of the form like this:

S -> c))))
S -> ((a
S -> ((b

Which will be converted to text like this

S -> c))))|((a|((b

Possible solution

I can suggest replacing the from_cfg() function with the from_text() function, since regex in production body is not supported in CFG in the expected way.

But these will be backward-incompatible changes and I want to know what you think about this?

Aunsiels commented 3 years ago

I think @IlyaEp will know better what was the purpose of these methods and how to use them.

The CFG.from_text method is not designed to receive a regex. In the documentation:

The structure of a production is: head -> body1 | body2 | ... | bodyn where | separates the bodies.

This is just a shortcut for: head -> body1 head -> body2 ...

The "|" symbol is not related to the regex "|" symbol. Each body is composed of terminals and variables, as a concatenation. That's it.

This representation is similar to the one used in NLTK.

I am not sure if RSAs are supposed to support these kinds of operations. Maybe both "from_text" and "from_cfg" methods are required? @IlyaEp was the one who created RSA, he might have an opinion on this.