drolbr / Overpass-API

A database engine to query the OpenStreetMap data.
http://overpass-api.de
GNU Affero General Public License v3.0
690 stars 90 forks source link

Invalid regular expression: "^[А-ЯЁ ]+$" #705

Open Zaczero opened 10 months ago

Zaczero commented 10 months ago
[out:json][timeout:60][bbox:{{bbox}}];
nwr[name~"^[А-ЯЁ ]+$"];
out body qt;
>;
out skel qt;
mmd-osm commented 10 months ago

While your query works on the overpass-api.de instance, some other instances like kumi.systems fail with the error message above. Some versions of C POSIX regular expressions don't seem to handle ranges with cyrillic characters properly.

As a quick workaround, you might try some other Overpass instance, or maybe avoid the range altogether by explicitly specifying all characters (not properly tested):

[out:json][timeout:60][bbox:{{bbox}}];
nwr[name~"^[АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯЁ ]+$"];
out body qt;
>;
out skel qt;

Minimum example https://cpp.godbolt.org/z/qz6Tn56j9 fails on some systems.

Zaczero commented 10 months ago

Interesting! This problem impacts my Overpass instance, which I set up using the instructions from https://overpass-api.de/full_installation.html on a Debian docker image. I'm wondering if I've overlooked something.

FROM debian:bookworm-slim

# Install dependencies
RUN apt-get update && apt-get install -y \
    wget \
    g++ \
    make \
    expat \
    libexpat1-dev \
    zlib1g-dev \
    liblz4-dev \
    lighttpd \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Download, extract and compile Overpass
RUN wget https://dev.overpass-api.de/releases/osm-3s_latest.tar.gz -O osm-3s_latest.tar.gz && \
    mkdir ./src && \
    tar -xzf osm-3s_latest.tar.gz -C ./src --strip-components=1 && \
    rm osm-3s_latest.tar.gz && \
    cd src && \
    ./configure --prefix="/app" --enable-lz4 && \
    make dist install clean && \
    cp -r rules .. && \
    cd .. && \
    rm -r ./src
...
mmd-osm commented 10 months ago

By the way, I'm getting the same issue on Ubuntu 22.04, which is also based on Debian bookworm. For some reason, the previous Debian version bullseye seems to work ok.

You could try and replace the first line in your Dockerfile by FROM debian:bullseye-slim to see it helps. We still need to figure out what exactly is causing this issue on the newer Debian version.

Zaczero commented 10 months ago

image

I think I found the cause of that. To check the currently applied locale:

std::cout << "Current Locale: " << setlocale(LC_ALL, NULL) << std::endl;

But maybe there is a better way to set the UTF-8 locale in the first place.

Zaczero commented 10 months ago

I have read that Python officially supports systems that have at least one of installed:

Maybe the same could be done in the overpass-api case.

...btw, I do confirm that switching to FROM debian:bullseye-slim fixed the issue.

drolbr commented 7 months ago

It looks like buggy Regex engines from the base system are a real problem. The final solution, even if a workaround, should be to open an avenue to use the Regex engine of choice. I don't know whether the final solution will do some during install time or runtime.

Zaczero commented 7 months ago

If the app uses the C locale (since the requested locale is not installed), I don't see it as much of a regex engine issue. The app should simply support a wider range of UTF-8 locales, as other apps do.