JustinCanton / Geo.NET

A lightweight method for communicating with the multiple online geocoding APIs.
MIT License
13 stars 8 forks source link

MapBox Query with pound sign # has Unexpected Results #43

Closed dernewt closed 2 years ago

dernewt commented 2 years ago

When calling .BuildGeocodingRequest or .BuildReverseGeocodingRequest with a query that contains a pound sign (#) the resulting URI is truncated at the location of the # and characters after are not included. This is related to how UriBuilder and UriBuilder .Query interact, specifically the constructed URI is fine (line 125) but when UriBuilder.Query is assigned to at 158 it truncates everything after #. There may be other problematic characters related to the construction of URIs.

According to the MapBox api any such punctuation can be passed but it's treated as a word separator: https://docs.mapbox.com/help/troubleshooting/address-geocoding-format-guide/

One possible fix is to replace all such punctuation with spaces.

A more ambitious fix would be to implement some sort of GeocodingParameters.Query builder pattern which could handle this and also enforce other MapBox query limitations such as a max "word" length of 20 and a max character length of 256. This pattern could also implement the "batch" feature where multiple queries are separated via semicolons (though batch queries also require mapbox.places-permanent)

JustinCanton commented 2 years ago

@dernewt, according to the documentation

A query that uses this secondary address information, as well as any associated special characters like commas (,) and pound symbols (#), will return the same coordinates as a query with this information stripped out.

So this information really has no meaning when it is passed. Could you give me a bit of background on your use case for this?

dernewt commented 2 years ago

The background/use case is nothing special, I'm using your library to abstract myself from the specifics of how MapBox works as much as possible. I'm passing through addresses that are variable in their format but are generally "human readable" addresses thus a # didn't really stand out.

Of course now that I read the MapBox docs I'm stripping all non-alpha-numeric as an extra step. I do agree that there must be a line where it's out of scope of the library and I defer that to you. I really think handling # (somehow) is required as it breaks your library. After that if you're filtering some characters it's easy to argue you should filter all of them considering that MapBox doesn't consider them.

Even if you did a full on query builder (which is out of scope of this bug) you'd still need to handle a raw query.

JustinCanton commented 2 years ago

I have done some playing around, and I don't really trust MapBox's statement that it doesn't consider them. Using the 4 examples they have on their website:

I get different results. Some of the queries give the same results, but others do not. Which calls into question whether they don't actually consider them and what that means.

The problem with stripping non alpha-numeric characters is that isn't really correct. You can have valid addresses with alpha-numeric characters, such as addresses in Hawaii.

Obviously the pound symbol has a special meaning within uri. Its the fragment identifier. Which is why everything after the pound symbol is moved to the end of the uri after formatting.

The way I see it, there are a couple of options I have:

  1. Strip the pound symbol from requests (as well as any other characters necessary)
  2. Encode the characters to respect uri requirements
  3. Do nothing and leave the requirements of handling and sanitizing data to the consumer

Of these, I think I am leaning towards option 2. I will think about it for a little while and get back to this.

JustinCanton commented 2 years ago

Digging into this some more, I have good news and bad news. The good news is that only the query in the url isn't being properly encoded. Meaning as long as I encode that, everything will work fine. The even better news is that none of the other libs aside from MapBox have this issue.

The bad news is about the query string building. Right now, it works correctly. The query string is properly encoded when used, and doesn't need to change. The problem is, the way I am currently building the query string isn't recommended. It has the possibility of an exploit. See the remarks under here.

So the fix you want is quick, but I need to invest a bit of time fixing up the query string building. I am hoping to have this completed soon.

JustinCanton commented 2 years ago

@dernewt. An alpha package has been created with the fix for this issue. Geo.MapBox.1.2.0-alpha.4.nupkg. Feel free to grab this and test it out to see if it fixes your issue.

I will be creating a full release version (1.2.0) at some point this weekend.