AllenInstitute / sonata

Collaboration between BBP and AIBS
BSD 3-Clause "New" or "Revised" License
54 stars 33 forks source link

Support regular expressions in Node sets files #82

Open arsenius7 opened 5 years ago

arsenius7 commented 5 years ago

Should we consider using regular expressions for matching node attributes in Node sets? That will allow some queries that can not be expressed with the current syntax (for instance, NOT queries).

These can be used with lists for OR queries or instead of them (i.e., "layer(4|5)" instead of ["layer4", "layer5"]).

We can use some markers to distinguish regex literals from exact value match, or treat all string values in Node sets as regular expressions.

arsenius7 commented 5 years ago

@kaeldai , @hernando , what's your opinion?

hernando commented 5 years ago

I'm in favor of this proposal. I'd assume that everything is a regular expression by default, I did this for specifying targets in RTNeuron and it has never been problematic, but rather useful instead.

arsenius7 commented 5 years ago

I would be cautious about using regex by default. Even if we use "full match", and not "contains" match criteria, special symbols like . or [ will be a trap for less regex-savvy users.

matz-e commented 5 years ago

I would also be in favor of having regular expressions by default (maybe only for a certain sub-set of attributes). Having a regular expression or a list that has to be combined adds unnecessarily complex parsing, IMO.

arsenius7 commented 5 years ago

OK, let us collect opinions here.

I would be in favor of something more explicit, if a bit verbose, for instance Mongo-like syntax.

"Layer23": {"layer": {"$regex": "L(2|3)"}}

The reason I give it such importance is that the convention chosen here will leak from node sets file to the code (I'd like to pass around node sets specified "on-the-fly"). It would be quite annoying to change it afterwards.

upd. Just a clarification: by "code" above I meant user code, not SONATA parsers.

hernando commented 5 years ago

I'm now thinking that you could go further and allow arithmetic expressions as well, but that starts to get hairy to implement in C++ and even in interpreted languages there's the question of how to do it efficiently.

Regarding your concrete suggestion, I'm not against using a different syntax for regular expressions, but I wouldn't necessarily chose what you suggested. Other alternatives are:

 "Layer23": {"layer": {"regex": "L(2|3)"}}
 "Layer23": {"layer": "regex:L(2|3)"}
matz-e commented 5 years ago

I'm now thinking that you could go further and allow arithmetic expressions as well, but that starts to get hairy to implement in C++ and even in interpreted languages there's the question of how to do it efficiently.

I'd rather stay out of implementing arithmetic expressions, and keep things simple.

Regarding your concrete suggestion, I'm not against using a different syntax for regular expressions, but I wouldn't necessarily chose what you suggested. Other alternatives are:

 "Layer23": {"layer": {"regex": "L(2|3)"}}
 "Layer23": {"layer": "regex:L(2|3)"}

I like the latter for conciseness, I would like to avoid the multiple possible types assigned to one key of "layer": <string> and "layer": <dictionary>.