CXuesong / WikiClientLibrary

/*🌻*/ Wiki Client Library is an asynchronous MediaWiki API client library targeting modern .NET platforms
https://github.com/CXuesong/WikiClientLibrary/wiki
Apache License 2.0
80 stars 16 forks source link

Add support for bounding box in GeoSearchGenerator #57

Closed zstadler closed 4 years ago

zstadler commented 4 years ago

The Wikimedia geosearch supports the use of a bounding box as an alternative to the coordinates+radius as a Geograpic selector:

gsbbox: Bounding box to search in: pipe (|) separated coordinates of top left and bottom right corners.

and provides an example:

api.php?action=query&list=geosearch&gsbbox=37.8|-122.3|37.7|-122.4

Since the coordinates+radius approach is limited to a 10000 meter radius, combining multiple requests in order to cover a larger area is a challenge. On the other hand, the use of a bounding box for searching Wikimedia is easier to aggregate and to integrate with other Geographic systems

Please consider adding support for search based on a bounding box.

zstadler commented 4 years ago

See also this Wikimedia API bug report related to the use of gsbbox for geosearch.

CXuesong commented 4 years ago

Thanks for your links, @zstadler ! I will check on this and work on the implementation after the holiday, which is, tomorrow :smile:

CXuesong commented 4 years ago

Published v0.7.0-int.6. You may now use GeoSearchGenerator.BoundingRectangle to specify a small rectangle with the left (longitude), top (latitude), width, height and search for the pages.

I'm planning to refector GeoSearch, GeoCoordinate and GeoCoordinateRectangle API. I'm going to extract the Dimension and Global from GeoCoordinate structure, and GeoCoordinateRectangle may need some polishment. If you have any more suggestion / feature requests regarding to these API, feel free to open another issue and let me know 😉

HarelM commented 4 years ago

Thanks for this! :-) What's a small rectangle? I'm getting the following error: OperationFailedException: toobig: Bounding box is too big - the exception should indicate which bbox I should be using I think... Also toobig is missing a space :-)

CXuesong commented 4 years ago

I've tried this roughly, and ranges less than 0.2 degrees in longitude and lattitude seem okay.

https://github.com/CXuesong/WikiClientLibrary/blob/cafda1f144d825cc6151eb1eb55e62b5322bcbf5/UnitTestProject1/Tests/GeneratorTests.cs#L413-L422

My hypothesis is that on MW API server, eventually you cannot bypass the Radius limitation of GeoSearch. 10km is roughly 0.28 degrees on earth.

So if you are planning to scan on some larger area the earth, you may need to split your range into a grid, and request for the smaller tiles one by one from the client.

And toobig is actually the error code from MW API response, like permissiondenied or badtoken.

HarelM commented 4 years ago

Thanks for the quick response! This is what I do right now with the 10Km radius search, only the circles are overlapping and I though I'll be able to do it in one call of bbox instead of around 1000. Here's the relevant code I was hoping to simplify... :-/ https://github.com/IsraelHikingMap/Site/blob/5bf63fc2a0e2c1a22bf82d3f1175141b45c25356/IsraelHiking.API/Services/Poi/WikipediaPointsOfInterestAdapter.cs#L77

HarelM commented 4 years ago

When using the GeoSearchGenerator it seems that I can't cross the pagination size of 500 in terms of number of results. The following is generating a 500 items results but I don't know how to continue to the next page:

                    var geoSearchGenerator = new GeoSearchGenerator(new WikiSite(wikiClient, new SiteOptions($"https://he.wikipedia.org/w/api.php")))
                    {
                        BoundingRectangle = GeoCoordinateRectangle.FromBoundingCoordinates(34.75, 32, 34.9, 32.15),
                        PaginationSize = 1000 // this is ignored
                    };
                    var results = await geoSearchGenerator.EnumItemsAsync().ToListAsync(); // this returns only 500...

Let me know if you want me to open a new issue on this or am I missing out something?

HarelM commented 4 years ago

Same request from the browser: https://he.wikipedia.org/w/api.php?action=query&maxlag=5&list=geosearch&gsradius=10&gsprimary=primary&gslimit=500&gsbbox=32.15%7C34.75%7C32%7C34.9 Seems like the response doesn't have a continuation parameter? not sure...

CXuesong commented 4 years ago

It seems so. GeoSearch does not support pagination for now. Example response of https://en.wikipedia.org/w/api.php?action=query&maxlag=5&list=geosearch&gsradius=10&gsprimary=primary&gslimit=2&gsbbox=32.15%7C34.75%7C32%7C34.9

{
    "batchcomplete": "",
    "query": {
        "geosearch": [
            {
                "pageid": 18328987,
                "ns": 0,
                "title": "Beit Zvi",
                "lat": 32.078408333333336,
                "lon": 34.821713888888894,
                "dist": 489.4,
                "primary": ""
            },
            {
                "pageid": 46324352,
                "ns": 0,
                "title": "HaAliya HaShniya Garden",
                "lat": 32.0697,
                "lon": 34.8148,
                "dist": 1127.4,
                "primary": ""
            }
        ]
    }
}
CXuesong commented 4 years ago

I think the continuation problem is originally tracked with phab:T95241 and closed as duplicate of phab:T78703.

Unfortunately, I don't think T78703 is going to be resolved soon...

CXuesong commented 4 years ago

Let's use #64 to track this.