Open arsenius7 opened 5 years ago
@kaeldai , @hernando , what's your opinion?
I'm in favor of this proposal. I'd assume that everything is a regular expression by default, I did this for specifying targets in RTNeuron and it has never been problematic, but rather useful instead.
I would be cautious about using regex by default.
Even if we use "full match", and not "contains" match criteria, special symbols like .
or [
will be a trap for less regex-savvy users.
I would also be in favor of having regular expressions by default (maybe only for a certain sub-set of attributes). Having a regular expression or a list that has to be combined adds unnecessarily complex parsing, IMO.
OK, let us collect opinions here.
I would be in favor of something more explicit, if a bit verbose, for instance Mongo-like syntax.
"Layer23": {"layer": {"$regex": "L(2|3)"}}
The reason I give it such importance is that the convention chosen here will leak from node sets file to the code (I'd like to pass around node sets specified "on-the-fly"). It would be quite annoying to change it afterwards.
upd. Just a clarification: by "code" above I meant user code, not SONATA parsers.
I'm now thinking that you could go further and allow arithmetic expressions as well, but that starts to get hairy to implement in C++ and even in interpreted languages there's the question of how to do it efficiently.
Regarding your concrete suggestion, I'm not against using a different syntax for regular expressions, but I wouldn't necessarily chose what you suggested. Other alternatives are:
"Layer23": {"layer": {"regex": "L(2|3)"}}
"Layer23": {"layer": "regex:L(2|3)"}
I'm now thinking that you could go further and allow arithmetic expressions as well, but that starts to get hairy to implement in C++ and even in interpreted languages there's the question of how to do it efficiently.
I'd rather stay out of implementing arithmetic expressions, and keep things simple.
Regarding your concrete suggestion, I'm not against using a different syntax for regular expressions, but I wouldn't necessarily chose what you suggested. Other alternatives are:
"Layer23": {"layer": {"regex": "L(2|3)"}} "Layer23": {"layer": "regex:L(2|3)"}
I like the latter for conciseness, I would like to avoid the multiple possible types assigned to one key of "layer": <string>
and "layer": <dictionary>
.
Should we consider using regular expressions for matching node attributes in Node sets? That will allow some queries that can not be expressed with the current syntax (for instance, NOT queries).
These can be used with lists for OR queries or instead of them (i.e.,
"layer(4|5)"
instead of["layer4", "layer5"]
).We can use some markers to distinguish regex literals from exact value match, or treat all string values in Node sets as regular expressions.