drolbr / Overpass-API

A database engine to query the OpenStreetMap data.
http://overpass-api.de
GNU Affero General Public License v3.0
709 stars 90 forks source link

Could you support PCRE regular expression ? #146

Open pyrog opened 9 years ago

pyrog commented 9 years ago

Hi,

I would use "advanced" regex like words boundaries or lookahead

regex word boundary

The "Perl Compatible Regular Expressions" seem easy to use. See : http://www.regular-expressions.info/pcre.html

Best regards,

Yves

pyrog commented 9 years ago

Does it work ? Or how it is possible to test it ?

Could you make a pull request ?

Thanks

mmd-osm commented 9 years ago

Well, in the prototype, all regex are handled by pcre now (there's no way to switch between pcre and posix regex yet). It sort of works on my local machine, but you'd need to set up your own instance for testing as of today.

The big question however is, if Roland (@drolbr) wants to introduce an additional dependency to pcre. Right now, there are only very few dependencies to other libs.

pyrog commented 9 years ago

there's no way to switch between pcre and posix regex yet

Is it an issue ? I think — but I could be wrong — that you could do the same search and more with PCRE that POSIX regex ?

@drolbr What is your position about prce ? :smile:

mmd-osm commented 9 years ago

@pyrog : In the meantime, you could do a few tests with PCRE enabled on the test instance: http://overpass-turbo.eu/s/b1e

Here's another example which will return ways with a single building=* tag only: http://overpass-turbo.eu/s/b0B

Disclaimer: there's no guarantee that this will ever make it into the official branch and the link will be discontinued after some time.

mmd-osm commented 8 years ago

PCRE has shown some performance regressions with certain UTF-8 characters during performance testing, see http://wiki.openstreetmap.org/wiki/User:Mmd/Overpass_API/Performance_Project_2016.

Example:

node["name"~"[قق][اا][لل]"]

I would recommend to leave POSIX as default, and enable PCRE only via some explicit query setting maybe.

mmd-osm commented 7 years ago

Issue should be closed, follow up is in #332

pyrog commented 4 years ago

Hi again,

I want to use positive or negative lookahead.

For example, I want to find wrong values of wikimedia pictures (not started with File: or Category:)

wikimedia_commons~/^(?!(Category|File):).*/i

result: static error: Invalid regular expression: "^(?!(Category|File):).*"

I could use wikimedia_commons~/http/ but I loose values like 1524488623511.jpg

gy-mate commented 6 months ago

I would recommend to leave POSIX as default, and enable PCRE only via some explicit query setting maybe.

@mmd-osm Is UTF-8 handling is still slow in PCRE? If no, could you please replace POSIX Extended with it? If yes, could you please add a query setting for PCRE?

Lookaheads and lookbehinds would be really useful to filter multiple tag values separated with semicolons, for example.