DJTommek / better-location

Telegram bot for parsing and generating various of location formats.
https://t.me/BetterLocationBot
MIT License
7 stars 2 forks source link

Bad detection of multiple coordinates in text #62

Open DJTommek opened 3 years ago

DJTommek commented 3 years ago

If coordinates are too close together (in mean of string), it might not detect them all. Examples:

50.087451,14.420671 50.087451,-14.420671 -50.087451,14.420671 -50.087451,-14.420671

Expected results:

50.087451,14.420671
50.087451,-14.420671
50.087451,14.420671
-50.087451,-14.420671

Detected results:

50.087451,14.420671
50.087451,-14.420671 (2x)

Workaround to match expected and detected results is add random character between coordinates, eg:

50.087451,14.420671 a 50.087451,-14.420671 ! -50.087451,14.420671 . -50.087451,-14.420671

This bug can be reproduced in both inline and text messages.

DJTommek commented 2 years ago

Could be solved via searching one coordinate at time instead of loading them all at once:

  1. Load string to run regex searching
  2. search search for location (preg_match() instead of preg_match_all())
  3. extract, parse and cleanup location to get real location string (eg do not count extra characters before and after location)
  4. in original string replace this part of clean location with some character (eg !)
  5. repeat steps 2, 3 and 4 until no locations are found.