DinoTools / python-overpy

Python Wrapper to access the Overpass API
https://python-overpy.readthedocs.io/
MIT License
243 stars 58 forks source link

Timeout returns empty result instead of error #62

Closed erabrahams closed 2 years ago

erabrahams commented 7 years ago
Issue type
OverPy version
0.4
OS

Python version

Summary

When setting a short timeout (1-30s), such that the timeout is reached, an empty result is returned instead of a timeout error. Looking into the query() function code determines that there is a \<remark> describing the timeout in the xml result, and that the error code '200' is raised, but the parse_xml() function is being used to resolve it to an empty result. A similar issue is discussed in the Overpass API #94.

Steps to reproduce

Run a short timeout query like

[out:xml][timeout:10];
area["name"="France"]->.searchArea;
(
node["amenity"="embassy"](area.searchArea);
way["amenity"="embassy"](area.searchArea);
relation["amenity"="embassy"](area.searchArea);
);
out center;

via result = overpy.Overpass().query(query))

Expected results

raise overpy.exception.OverpassGatewayTimeout or some relevant error

Actual results

returns a result for which

print(result.get_nodes())
print(result.get_ways())
print(result.get_relations())

returns an empty list [] for each.

XML result looked like:

b'<?xml version="1.0" encoding="UTF-8"?>\n<osm version="0.6" generator="Overpass API">\n<note>The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.</note>\n<meta osm_base="2017-03-17T15:40:02Z" areas="2017-03-17T06:44:02Z"/>\n\n<remark> runtime error: Query timed out in "query" at line 1 after 2 seconds. </remark>\n\n</osm>\n'

f.code: 200
phibos commented 7 years ago

Thanks for reporting the issue. Looks like there is a technical reason for reporting errors this way(see drolbr/Overpass-API#94). So we should try to detect these errors and raise an exception.

Sadly the documentation is a bit misleading.

See http://overpass-api.de/command_line.html

200 OK is sent when the query has been successfully answered. The payload of the response is the result data. The possible formats are described on the output formats page.

See http://overpass-api.de/output_formats.html

By contrast, if you find an error tag, something seriously has gone wrong,

But I was unable to find any information about the remark tag except for forum posts and github issues

phibos commented 7 years ago

I have started to implement a handler to find remark tags and elements and raise an exception. But I'm not sure if we should try to parse the raw message and try to detect an timeout error(OverpassGatewayTimeout). I think we should detect 'runtime errors' and unknown errors.

Feel free to have a look at the pull request.

What do you think?

mmd-osm commented 7 years ago

fyi: here's a link to the respective logic in overpass turbo: https://github.com/tyrasd/overpass-turbo/blob/81a6eaa4faa2a05078d8cede82727f8235207963/js/overpass.js#L117-L131

I'm not sure remark is documented anywhere.

As for XML responses, see https://github.com/drolbr/Overpass-API/blob/350cf560956c669171b949b162f239b55936e1a8/src/overpass_api/output_formats/output_xml.cc#L36-L42 and https://github.com/drolbr/Overpass-API/blob/350cf560956c669171b949b162f239b55936e1a8/src/overpass_api/frontend/web_output.cc#L41-L117

-> different runtime errors / warnings are indicated via remark, so it's probably very difficult to map the error text to a certain error condition. I would in fact recommend to just return the error text 'as is' and indicate some general error condition, regardless of what text is returned in remark.

erabrahams commented 7 years ago

Thanks for your quick response on this. So as it stands in the pull request, a runtime error remark will raise a generic error. What other situations might this arise in?

It seems like it would probably not be valid to assume that it arose from a timeout, which is the result I am trying to respond to in my code.

mmd-osm commented 7 years ago

<remark> could show up in quite different situations. Here's a few I could identiy

erabrahams commented 7 years ago

I think the resource one at least is covered by the 504 error code, but could we maybe pass out a message with the return of OverpassRuntimeError that could be used to suss out the specifics?

erabrahams commented 7 years ago

Mmm okay. I was thinking of the hard timeout, i.e. one that fails before the timeout in the query is even reached...thought that was related to server resources.

mmd-osm commented 7 years ago

The behavior may also differ, if the http header was already been written as http 200. In that case the only way to report errors is via <remark>. If no http header hasn't been written yet, errors may also be indicated via proper http status codes.

erabrahams commented 7 years ago

Fair point. I think as long as the <remark>-type error passes out the contents of the <remark>, it should be enough for the user to handle on their end. Unless overpass changes the error messages down the line...

erabrahams commented 7 years ago

Do you have a sense for when this change will go live? I temporarily solved the issue by overloading your query function with direct checks for the timeout result, but it would be nice to not have to resort to that.

phibos commented 7 years ago

@erabrahams initial support has been added in the remark branch 20 days ago. Free free to have a look at the PR #63

phibos commented 7 years ago

@erabrahams The feature has been merged into the master branch. Can you please give it a try.

erabrahams commented 7 years ago

Hmm... I tried both updating overpy, then uninstalling and reinstalling (with pip), and it still says 'overpy.exception' has no attribute 'OverpassRuntimeRemark'.

Update - this only seems to happen when the query is run as json. Queries run as xml return an empty result and raise no error.

Looking at overpy.exception: OverpassRuntimeError and OverpassRuntimeRemark have no init() function, they just call pass.

erabrahams commented 7 years ago

Apologies for the flood of messages. The problems I was experiencing before were a result of my IDE caching the older version of overpy. I can now catch an OverpassRuntimeError.

That said, the way you are returning the remark, I don't think OverpassRuntimeRemark will ever be raised, because the timeout text within the remark does not contain the word "remark." When catching a remark error (as I would expect to output) with a timeout of 1 like so:

try:
        print("Querying Overpass...")
        return OP.query(query)

...

except overpy.exception.OverpassRuntimeRemark as remark:
        if "timed out" in str(remark):
            print(" Timeout occurred. Splitting query and trying again. ")
            raise overpy.exception.OverpassGatewayTimeout
        raise

This error is returned: OverpassRuntimeError: runtime error: Query timed out in "id-query" at line 2 after 2 seconds.

phibos commented 7 years ago

Please have a look at the exception section in documentation.

exception overpy.exception.OverpassRuntimeError(msg=None)

Raised if the server returns a remark-tag(xml) or remark element(json) with a message starting with ‘runtime error:’.

exception overpy.exception.OverpassRuntimeRemark

Raised if the server returns a remark-tag(xml) or remark element(json) with a message starting with ‘runtime remark:’.

The message returned by the server is:

runtime error: Query timed out in "id-query" at line 2 after 2 seconds.

Means it is an runtime error not an runtime remark. So you have to use the OverpassRuntimeError exception or you can use overpy.exception.OverpassError to handle all exceptions related to remark tags or elements.

OverpassError is the base exception of OverpassRuntimeError and OverpassRuntimeRemark

Just change your code to

try:
        print("Querying Overpass...")
        return OP.query(query)
...

except overpy.exception.OverpassError as remark:
        if "timed out" in remark.msg:
            print(" Timeout occurred. Splitting query and trying again. ")
            raise overpy.exception.OverpassGatewayTimeout
        raise
erabrahams commented 7 years ago

Yeah, that will work just fine. I think what I was getting at is that "runtime error: whatever" is the remark. From overpass-turbo, the raw json output is "remark": "runtime error: Query timed out in \"id-query\" at line 2 after 2 seconds."

Maybe I'm missing something, but I'm not sure there is a situation the error message would start with "runtime remark". If you have an example, that would be great.