ConnectedHumber / Air-Quality-Web

The web interface and JSON api for the ConnectedHumber Air Quality Monitoring Project.
https://sensors.connectedhumber.org/
Mozilla Public License 2.0
9 stars 4 forks source link

Retrieve recent device records & devices closest to a given point #33

Closed bsimmo closed 5 years ago

bsimmo commented 5 years ago

I'm looking at the documentation under docs for api.

I cannot see a way to ask for a specific reading from a specific device (id) for time now (or past 10 mins, or past hour) or for all sensor data for 'now' for that device; I can either get all PM25 data for all sensors 'now' or the history of one sensor type for a device

e.g. I would like (I think) _https://aq.connectedhumber.org/api.php?action=fetch-data&device-id=38&datetime=now&reading_type=PM25_ to return just the current PM25 reading for device 38

_https://aq.connectedhumber.org/api.php?action=fetch-data&device-id=38&datetime=now&reading_type=PM25&average-seconds=600_ same but returning the average over that 600 seconds

reading_type=ALL (return all available sensor reading) or reading_type=PM25,PM10 reading_type=Temperature,Humidity,Pressure etc to return the specified readings.

At the moment it seems I needs multiple requests to cover all the sensors types, that's a lot of data to shift through and must be more sever load for each request ?

sbrl commented 5 years ago

Hey, @bsimmo!

As I understand it, you want to get multiple reading types at a time for a single device?

You can already get data for a time period a single reading type for a single device at a time by using the device-data. Example:

https://aq.connectedhumber.org/api.php?action=device-data&device-id=18&reading-type=PM25&start=2019-01-19T18:14:59.992Z&end=2019-01-20T18:14:59.992Z

This gets all the PM25 data for the device with the id 18 between 2019-01-19T18:14:59.992Z and 2019-01-20T18:14:59.992Z.

fetch-data, on the other hand, works differently. It fetches an average reading for a single point in time for all devices. It is actually the action that powers the heatmap.

It isn't currently possible to get all reading types for a specific time for a single device, but this is certainly an action that we can consider adding. It's a bit problematic though, because readings come in at random times - and simply asking for readings now doesn't really work. Which reading do you want? Perhaps an average-seconds like device-data has would help.

It might help here if you explained your use-case. What are you trying to achieve?

bsimmo commented 5 years ago

I tried getting the reading using the between times, but I don't want the long list, it also seems expensive in data transfer. I just want the current readings.

For me, 'now' should just give the last reported reading, probably withing the last 18 mins (3 readings period) otherwise return an error. (No recent data?)

My program (proof of concept) looks up the nearest device from my gps location. I want to then ask that device what its current readings are, I don't care about the history. I might not care for all the sensors, and just want the temp or the PM count. It's a display system, like a weather app.

We have the option already for an average reading over a given time, so though that might be useful to keep that as an option as well. I could then ask for current reading and the average reading, compare the two and give a trend.

(Talk is in Mattermost, but I'll move to the forum. Both are blocked at work though)

On Mon, 3 Jun 2019, 9:27 pm Starbeamrainbowlabs, notifications@github.com wrote:

Hey, @bsimmo https://github.com/bsimmo!

As I understand it, you want to get multiple reading types at a time for a single device?

You can already get data for a time period a single reading type for a single device at a time by using the device-data. Example:

https://aq.connectedhumber.org/api.php?action=device-data&device-id=18&reading-type=PM25&start=2019-01-19T18:14:59.992Z&end=2019-01-20T18:14:59.992Z

This gets all the PM25 data for the device with the id 18 between 2019-01-19T18:14:59.992Z and 2019-01-20T18:14:59.992Z.

fetch-data, on the other hand, works differently. It fetches an average reading for a single point in time for all devices. It is actually the action that powers the heatmap.

It isn't currently possible to get all reading types for a specific time for a single device, but this is certainly an action that we can consider adding. It's a bit problematic though, because readings come in at random times - and simply asking for readings now doesn't really work. Which reading do you want? Perhaps an average-seconds like device-data has would help.

It might help here if you explained your use-case. What are you trying to achieve?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ConnectedHumber/Air-Quality-Web/issues/33?email_source=notifications&email_token=ACYAXNZG3UVPDO5HY3JXFQTPYV5CBA5CNFSM4HSDRGY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW2S7KI#issuecomment-498413481, or mute the thread https://github.com/notifications/unsubscribe-auth/ACYAXN7ZHFMA5WN4VPGIIBDPYV5CBANCNFSM4HSDRGYQ .

sbrl commented 5 years ago

Thanks for the reply, @bsimmo.

(Talk is in Mattermost, but I'll move to the forum. Both are blocked at work though)

I'd appreciate it if you could keep all the discussion relating to this feature request here on GitHub, so that it's all in one place :-)

That's..... odd that it's blocked. I'm really curious as to what filtering system you have in place, because a blacklist-based system couldn't know about the forum that we've literally just setup. It sounds more like a whitelist than a blacklist system to me.

It's a display system, like a weather app.

Anyway, I digress. I see - that use-case isn't very well covered by the API at the moment.

I tried getting the reading using the between times, but I don't want the long list, it also seems expensive in data transfer.

It's actually not as expensive as you think. That query in particular is really rather efficient. Data transfer is not the big bottleneck in the system at the moment - that title goes to both the overhead of making a request in the first place (thanks, Apache - it's really inefficient at the moment - I'm hoping to enter discussions about using PHP-FPM as the PHP backend to improve performance - even if we don't use Nginx too), and the SQL queries - so what you might first think might cause exessive load can actually turn out to be a red herring, so to speak.

fetch-data is in fact the one that causes the highest server load, as the SQL query behind is both complex and hard to optimise (the query itself can take 150ms+ at times O.o).


The problem with having a now action is that it's awkward to figure out which reading you want. The device-data action's purpose is to retrieve readings between 2 different points in time, so reflecting on your use-case I'm not sure that augmenting the device-data action is the right solution.

Alternatively, let me propose a different action to solve the problem. What do you think of this?

device-recent-data

Fetches the last X readings reported by a device.

Parameter Type Meaning
device-id int The id of the device to get data for.
reading-type string The type of reading to obtain data for.
count int The number of readings to return.

https://example.com/api.php?action=device-recent-data&device-id=24&count=5&reading-type=PM25

Returning a variable number of readings at once is actually a bit of a pain, due to the structure of the database - though it's possible I may be able to use json_extract to pull out the readings I want from readings.raw_json - though I'll have to check with @robinharris as to whether the JSON stored there has a standardised format that I can take advantage of. If so, I might be able to enable you to do something like this:

https://example.com/api.php?action=device-recent-data&device-id=24&count=2&reading-type=humidity,temperature

This would return something like this:

[
    {
        "id": 154728,
        "value": [ 55.6, 15.8 ],
        "datetime": "2019-06-01 10:32:46"
    },
    {
        "id": 154714,
        "value": [ 54.2, 16.3 ],
        "datetime": "2019-06-01 10:25:28"
    }
]
bsimmo commented 5 years ago

That all sounds good to me from a first read. Robin's now tagged in and he had a similar use case that would need it. So it would be nice to here from him too.

My aim is to keep the enduser code side simple as the target may be children and schools.

( I posted here after the problem came up, by move to the forum I mean the chat on the display info and code, not this specific problem.

It's a school net nanny, it has a mind of its own. It will use black and white lists, age rankings, categories, word catching and probably page scanning Or more than likely it throws a dice and takes a guess ;-) and works out what will annoy you the most. )

Replying by email to Github doesn't seem too good. Sorry for the blocks of text.

On Tue, 4 Jun 2019, 3:39 pm Starbeamrainbowlabs, notifications@github.com wrote:

Thanks for the reply, @bsimmo https://github.com/bsimmo.

(Talk is in Mattermost, but I'll move to the forum. Both are blocked at work though)

I'd appreciate it if you could keep all the discussion relating to this feature request here on GitHub, so that it's all in one place :-)

That's..... odd that it's blocked. I'm really curious as to what filtering system you have in place, because a blacklist-based system couldn't know about the forum that we've literally just setup. It sounds more like a whitelist than a blacklist system to me.

It's a display system, like a weather app.

Anyway, I digress. I see - that use-case isn't very well covered by the API at the moment.

I tried getting the reading using the between times, but I don't want the long list, it also seems expensive in data transfer.

It's actually not as expensive as you think. That query in particular is really rather efficient. Data transfer is not the big bottleneck in the system at the moment - that title goes to both the overhead of making a request in the first place (thanks, Apache - it's really inefficient at the moment - I'm hoping to enter discussions about using PHP-FPM as the PHP backend to improve performance - even if we don't use Nginx too), and the SQL queries - so what you might first think might cause exessive load can actually turn out to be a red herring, so to speak.

fetch-data is in fact the one that causes the highest server load, as the SQL query behind is both complex and hard to optimise (the query itself can take 150ms+ at times O.o).

The problem with having a now action is that it's awkward to figure out which reading you want. The device-data action's purpose is to retrieve readings between 2 different points in time, so reflecting on your use-case I'm not sure that augmenting the device-data action is the right solution.

Alternatively, let me propose a different action to solve the problem. What do you think of this? device-recent-data

Fetches the last X readings reported by a device.

Parameter Type Meaning device-id int The id of the device to get data for. reading-type string The type of reading to obtain data for. count int The number of readings to return.

  • Return the 5 most recent PM2.5 measurements for the device with the id 24:

https://example.com/api.php?action=device-recent-data&device-id=24&count=5&reading-type=PM25

Returning a variable number of readings at once is actually a bit of a pain, due to the structure of the database - though it's possible I may be able to use json_extract https://mariadb.com/kb/en/library/json_extract/ to pull out the readings I want from readings.raw_json - though I'll have to check with @robinharris https://github.com/robinharris as to whether the JSON stored there has a standardised format that I can take advantage of. If so, I might be able to enable you to do something like this:

  • Return the last 2 humidity and temperature readings in a single request:

https://example.com/api.php?action=device-recent-data&device-id=24&count=2&reading-type=humidity,temperature

This would return something like this:

[ { "id": 154728, "value": [ 55.6, 15.8 ], "datetime": "2019-06-01 10:32:46" }, { "id": 154714, "value": [ 54.2, 16.3 ], "datetime": "2019-06-01 10:25:28" } ]

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ConnectedHumber/Air-Quality-Web/issues/33?email_source=notifications&email_token=ACYAXNZR7ASZELCG4HIEN4TPYZ5ANA5CNFSM4HSDRGY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW4ZBVQ#issuecomment-498700502, or mute the thread https://github.com/notifications/unsubscribe-auth/ACYAXN2P3WGC3DHCFWZFQQ3PYZ5ANANCNFSM4HSDRGYQ .

sbrl commented 5 years ago

Cool, I'll see what I can do about implementing that then.

( I posted here after the problem came up, by move to the forum I mean the chat on the display info and code, not this specific problem.

It's a school net nanny, it has a mind of its own. It will use black and white lists, age rankings, categories, word catching and probably page scanning Or more than likely it throws a dice and takes a guess ;-) and works out what will annoy you the most. )

Not sure I quite follow.

Replying by email to Github doesn't seem too good. Sorry for the blocks of text.

No problem! I can recommend the web interface if you're having trouble replying directly via email.

bsimmo commented 5 years ago

I do use the web interface or app normally. (like now)

But if you reply by email you loose most paragraph formatting etc and just get one long text block. No idea why it strips returns out.

BNNorman commented 5 years ago

As you know, I've been looking at this from a python point of view. Below is the code I used and it works - it returns a tuple (timestamp,value) for the chosen reading_value_type. 5 calls can be made and the results amalgamated and returned as a JSON or , if just interested in PM2.5 then only one call is needed

I can attach the full python code if you wish but you need access to a database to run it - like I have at home, running on a Pi.

Anyway, the key is in the SQL which I constructed from two SQL statements (for clarity) - one to select the last timestamp within the last 18 minutes which is combined with the one to return the value for that timestamp. The combined SQL is then executed as a single SQL statement.

def latest_value(device_id,reading_type_id):
    #
    #
    # choose a time period 18 minutes earlier than now
    NOW=datetime.datetime.now()-datetime.timedelta(minutes=18)
    TS=NOW.timestamp()

    cursor=db.cursor()

    # latest timestamp selector SQL
    TIME_SQL="select max(storedon) from readings join reading_values on readings.id=reading_values.reading_id where " \
        "device_id={} and storedon>\"{}\" and reading_value_types_id={}".format(device_id,TS,reading_type_id)

    # combine the SQL for time selection with data selection
    SQL="Select storedon,value from reading_values join readings on readings.id=reading_values.reading_id where " \
        "device_id={} and storedon=({}) and reading_value_types_id={}".format(device_id,TIME_SQL,reading_type_id)

    cursor.execute(SQL)
    row = cursor.fetchone()

    if row is None:
        return (None, None)

    # return actual
    return (row[0].strftime("%Y/%m/%d %H:%M:%S"),row[1])

Hope that helps

sbrl commented 5 years ago

Thanks, @BNNorman!

Also, it sounds like an action to fetch the closest device to a location would be helpful, @bsimmo - after talking to @BNNorman at the hardware meetup last night. I'll see what I can do about that too.

BNNorman commented 5 years ago

@sbrl in case it helps, here's my complete python code. A quadtree is created centred on lat/lon 0,0. The database devices table is read in and each device added to the quadtree. The quadtree is masked by a rectangular area + and - 200m E/W/N and S. The coords I used showed no devices within 100m.

After finding the device calls are made to return the sensor readings (one at a time here because they could be at slightly different times.)

Now, it might be worth adding an index to the storedon field to speed the search though it is fast enough as it is - returning values in one or two seconds.

#
# Find nearest device to given coords and 
# return latest readings from a given device
#
from smartquadtree import Quadtree
import math
import mysql.connector
import datetime

# database
dbHost="192.168.1.81" # change to your host name
dbUser="your_DbUserName"
dbPassword="your_dbUserPassword"
dbName="aq_db"

# GPS distances (in decimal degree fractions)
# lat and lon assumed to be the same
GPS_1MM=0.00000001
GPS_1M=1000*GPS_1MM
GPS_100M=100*GPS_1M
GPS_1KM=10*GPS_100M
GPS_10KM=10*GPS_1KM

# device info for adding to quad tree for searching

class device():
    def __init__(self,id,name,lon,lat):
        self.id=id
        self.lon=lon
        self.lat=lat
        self.name=name

    def get_pos(self):
        return (self.lon,self.lat)

    def get_id(self):
        return self.id

    def get_x(self):
        return self.lon

    def get_y(self):
        return self.lat

    def get_name(self):
        return self.name

# create a quadtree to store the devices into
# width=lon ranges -180 to +180
# height=lat ranges -90 to +90
devices=Quadtree(0,0,360,180);

# establish database connection
db = mysql.connector.connect(
    host=dbHost,
    user=dbUser,
    passwd=dbPassword,
    database=dbName
)

if db is None:
    print("Unable to connect to database")
    exit()

# load all known devices from the database
def load_devices():
    cursor=db.cursor()

    cursor.execute("Select device_id,device_name,device_longitude,device_latitude from devices")

    for row in cursor.fetchall():
        #print("got device",row)
        d=device(row[0],row[1],row[2],row[3])
        devices.insert(d)

# last_value()
# return the last value (record with highest id) for a device
#
# type IDs are
# 1 Humidity
# 2 PM10
# 3 PM25
# 4 Pressure
# 5 Temperature

def last_value(device_id,reading_type_id):
    #
    cursor=db.cursor()

    SQL="select value from reading_values where " \
        "reading_values.id=(select max(reading_values.id) from reading_values join readings on reading_values.reading_id=readings.id and reading_value_types_id={} and " \
        "device_id={})".format(reading_type_id,device_id)

    cursor.execute(SQL)
    row=cursor.fetchone()
    if row is None: return None
    return row[0]

# latest_value()
# return the latest value (record with highest storedon) for a device
# and given reading_type_Id within 18 minutes of 'now'
# returns (storedon,value)
#
# type IDs are
# 1 Humidity
# 2 PM10
# 3 PM25
# 4 Pressure
# 5 Temperature

def latest_value(device_id,reading_type_id):
    #
    #
    # choose a time period 18 minutes earlier than now
    # if a device is live there should be data in this time period
    NOW=datetime.datetime.now()-datetime.timedelta(minutes=18)
    TS=NOW.timestamp()

    cursor=db.cursor()

    # sql to select the latest storedon time
    TIME_SQL="select max(storedon) from readings join reading_values on readings.id=reading_values.reading_id where " \
        "device_id={} and storedon>\"{}\" and reading_value_types_id={}".format(device_id,TS,reading_type_id)

    # combine the SQL for time selection with data selection (just makes for easier reading)
    SQL="Select storedon,value from reading_values join readings on readings.id=reading_values.reading_id where " \
        "device_id={} and storedon=({}) and reading_value_types_id={}".format(device_id,TIME_SQL,reading_type_id)

    cursor.execute(SQL)
    row = cursor.fetchone()

    if row is None:
        return (None, None)

    # return storedon and value
    return (row[0].strftime("%Y/%m/%d %H:%M:%S"),row[1])

# set_mask()
# limits the quadtree search area - could be any closed polygon
# filter the devices based on a rectangular area centred on one location
# within decimal angle dist

def set_mask(lon,lat,dist):
    mask=[(lon-dist,lat+dist),(lon+dist,lat+dist),(lon+dist,lat-dist),(lon-dist,lat-dist)]
    devices.set_mask(mask)

# get_dist()
# return the decimal degree distance between two points A and B
def get_dist(Ax,Ay,Bx,By):
    # calc length of hypotenuse
    dx=Ax-Bx
    dy=Ay-By
    return math.sqrt(dx*dx+dy*dy)

# find_nearest_device()
# scan the devices quadtree looking for nearest to this_lon,this_lat
# within decimal angle RANGE

def find_nearest_device(this_lat,this_lon,RANGE):

    set_mask(this_lon,this_lat,RANGE)

    # scan for nearest
    min_dist=None
    dev_found=None

    # find the closest
    for dev in devices.elements():
        lon,lat=dev.get_pos()
        dist=get_dist(this_lon,this_lat,lon,lat)
        if min_dist is None: min_dist=dist
        if dist<min_dist:
            min_dist=dist
            dev_found=dev

    return dev_found

# load the devices from the devices table
load_devices()

#########################################################################
#
# find nearest device. If RANGE=GPS_100M it returns None
# which shows the code works at range 2*GSP_100M it finds a device

RANGE=2*GPS_100M # area within 200m in any direction

nearest=find_nearest_device(53.796,-0.344,RANGE)

if nearest is None:
    print("No devices found in this list within +/- given range")

else:

    print("Nearest device is",nearest.get_id(),"name=",nearest.get_name(),"pos=",nearest.get_pos())

    id=nearest.get_id()
    print("Latest_humidity=",latest_value(id,1))
    print("Latest PM10=", latest_value(id, 2))
    print("Latest PM25=", latest_value(id, 3))
    print("Latest Pressure=", latest_value(id, 4))
    print("Latest Temperature=", latest_value(id, 5))

    print("Last Humidity=",last_value(id,1))
    print("Last PM10=",last_value(id,2))
    print("Last PM25=",last_value(id,3))
    print("Last Pressure=",last_value(id,4))
    print("Last Temperature=",last_value(id,5))
sbrl commented 5 years ago

Hey, thanks @BNNorman! Now that I've finished dealingw ith #30, I'll take a look at this one next when I've got some more time.

sbrl commented 5 years ago

Ok @bsimmo, I've implement a new device-data-recent action to cover part of your use-case. The next half I guess is determining the closest device to a given location.

This is an interesting challenge. While @BNNorman has used a QuadTree and done it that way, I'd really like to do it in pure SQL. To that end the ideal solution is the inbuilt geometry and geographic types - specifically the POINT column type, but since the database wasn't designed making use of those types, it makes it somewhat more challenging. I'd like to do some more research into this to see if it's possible to do a dynamic conversion without too much of a performance hit.

On the other hand, we could just roll our own pythag, but that isn't really designed for geographical co-ordinates (not that this is a particular issue for us at the moment).

BNNorman commented 5 years ago

We could ,with care, modify the database. The changes would affect the existing api.

Been looking at the SQL. Presumably you would return a list of points within a polygon? Presumably this is the same as a quadtree mask operation. I think I'll play with this at home to see if there's a significant speed issue.

Personally I dont see this being a big issue unless our database contains millions of devices and there are thousands of queries being made in a short space of time. But, for scalability yes we should implement an optimal solution.

bsimmo commented 5 years ago

See what you come up with, I assume we're not the first as there are a lot of web pages and databases with location data with maps and also apps getting local information. At the moment it is very quick to just get all locations and find the nearest station, even in Python3. The slowest parts are collecting the data from the server. The less calls and data transfer the better. But this has to be taken against server load and delay on the server. While I know the low number of devices we have isn't a problem, 300 odd may cause increase load?

I assume you quite enjoy finding out and working out a solution.

On Thu, 13 Jun 2019, 10:21 pm Starbeamrainbowlabs, notifications@github.com wrote:

Ok @bsimmo https://github.com/bsimmo, I've implement a new device-data-recent action to cover part of your use-case. The next half I guess is determining the closest device to a given location.

This is an interesting challenge. While @BNNorman https://github.com/BNNorman has used a QuadTree and done it that way, I'd really like to do it in pure SQL. To that end the ideal solution is the inbuilt geometry and geographic types https://mariadb.com/kb/en/library/geographic-geometric-features - specifically the POINT column type, but since the database wasn't designed making use of those types, it makes it somewhat more challenging. I'd like to do some more research into this to see if it's possible to do a dynamic conversion without too much of a performance hit.

On the other hand, we could just roll our own pythag, but that isn't really designed for geographical co-ordinates (not that this is a particular issue for us at the moment).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ConnectedHumber/Air-Quality-Web/issues/33?email_source=notifications&email_token=ACYAXN4DUIVVKSKDFFSHU73P2K25LA5CNFSM4HSDRGY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXVCETI#issuecomment-501883469, or mute the thread https://github.com/notifications/unsubscribe-auth/ACYAXN6C6UKYHOV6ANKAWNLP2K25LANCNFSM4HSDRGYQ .

BNNorman commented 5 years ago

I have seen some of those which use open source api's but search the world not our database.

Anyway, SBRL's suggestion to use Point() and ST_distance() in an SQL call works but requires us to create a table (or column) of device points.

This query returns the st_distance from 53.7,-.34 of our devices but I'm not sure what the units are - not km surely, too small for meters?

select st_distance(Point(53.7,-.34),loc) as dist from dev_pts;
+-------------------+
| dist              |
+-------------------+
| 76.54165248200184 |
| 76.54165248200184 |
| 76.43971287282548 |
| 76.51544343342525 |
| 76.49586160546113 |
| 76.49586302087422 |
| 76.49586089775458 |
| 76.45946505214322 |
| 76.51651486514794 |
| 76.49133779299537 |
| 76.52983777478579 |
| 76.53301999781156 |
| 76.48411447724907 |
| 76.49586019124807 |
+-------------------+

Using min(st_distance(...)) will return the nearest.

Time to go swimming on a golf course. Will look at this later.

bsimmo commented 5 years ago

Enjoy the swimming, you'll need to walk around with a few sensors strapped to you.

I've altered my code to this new API call and it works for what I need.

I'll post the simple python code up on the forum, I'm going to expand it. But is just proof of concept.

On Fri, 14 Jun 2019, 7:30 am Brian, notifications@github.com wrote:

I have seen some of those which use open source api's but search the world not our database.

Anyway, SBRL's suggestion to use Point() and ST_distance() in an SQL call works but requires us to create a table (or column) of device points.

This query returns the st_distance from 53.7,-.34 of our devices but I'm not sure what the units are - not km surely, too small for meters?

select st_distance(Point(53.7,-.34),loc) as dist from dev_pts; +-------------------+ | dist | +-------------------+ | 76.54165248200184 | | 76.54165248200184 | | 76.43971287282548 | | 76.51544343342525 | | 76.49586160546113 | | 76.49586302087422 | | 76.49586089775458 | | 76.45946505214322 | | 76.51651486514794 | | 76.49133779299537 | | 76.52983777478579 | | 76.53301999781156 | | 76.48411447724907 | | 76.49586019124807 | +-------------------+

Using min(st_distance(...)) will return the nearest.

Time to go swimming on a golf course. Will look at this later.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ConnectedHumber/Air-Quality-Web/issues/33?email_source=notifications&email_token=ACYAXNYB6QZ3LS6LG2TWZ7DP2M3JNA5CNFSM4HSDRGY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXV3IAA#issuecomment-501986304, or mute the thread https://github.com/notifications/unsubscribe-auth/ACYAXN6PBWTCHBVHOT75KULP2M3JNANCNFSM4HSDRGYQ .

BNNorman commented 5 years ago

SBRL - Pure SQL search for nearest device within a given range of the search co-ordinates - appears to work ok.

If you remove the min() part of the SQL it lists all devices in range - for that nice warm glow. With min() it lists the nearest device only.

This solution doesn't require any changes to the database but I recognise that converting lat/lon to Point() will speed up the query but that causes problems ensuring Point() and device_longitude/latitude are always in sync (though an update query can be run whenever the lat/lon changes). Also, the query speed may be improved by adding extra indexes to the devices table - probably best done on a Point() column if MySQL allows that.

I think you, Robin and I should come to some concensus at the meetup next week.

import mysql.connector

# database
dbHost="192.168.1.81" # change to your host name
dbUser="Brian"
dbPassword="Retired16"
dbName="aq_db"

# GPS distances (in decimal degree fractions)
# lat and lon assumed to be the same
GPS_1MM=0.00000001
GPS_1M=1000*GPS_1MM
GPS_100M=100*GPS_1M
GPS_1KM=10*GPS_100M
GPS_10KM=10*GPS_1KM

# establish database connection
db = mysql.connector.connect(
    host=dbHost,
    user=dbUser,
    passwd=dbPassword,
    database=dbName
)

if db is None:
    print("Unable to connect to database")
    exit()

# area of interest is a closed polygon (rectangle)

def get_polygon(lon,lat,dist):
    poly="Polygon(("
    poly=poly+"{} {},".format(lon+dist, lat-dist)
    poly=poly+"{} {},".format(lon+dist, lat+dist)
    poly=poly+"{} {},".format(lon-dist, lat+dist)
    poly=poly+"{} {},".format(lon-dist, lat-dist)
    # close the shape
    poly=poly+"{} {}".format(lon+dist, lat-dist)
    poly=poly+"))"
    return poly

# find the nearest device to given co-ords
def nearest_device(lon,lat,max_dist):
    #
    try:
        cursor = db.cursor()

        poly = get_polygon(lon, lat, max_dist)

        print("Checking if lon=", lon, "lat=", lat, " is within", poly)

        thisPoint="Point({},{})".format(lon,lat)

        NearestSQL = "select device_id,min(ST_DISTANCE(Point(device_longitude,device_latitude),"+thisPoint+")) " \
                     "from devices " \
                     "where ST_WITHIN(Point(device_longitude,device_latitude),ST_GEOMFROMTEXT('" + poly + "'));"

        # print("NearestSQL=",NearestSQL)

        cursor.execute(NearestSQL)
        rows = cursor.fetchall()

        if rows is None: return None

        # there could be several devices at same distance so list them all
        devices = []
        for row in rows:
            devices.append(row)

        # return device Id and distance
        return devices
    except Exception as e:
        print("Exception",e)
        return []

#############################################################################
#
RANGE=5*GPS_1KM # added to current pos to form a rectangular area of interest

# lat/long is in middle of range
# params lon/lat
nearest=nearest_device(-0.3644125,53.761374755,RANGE)

if nearest is None or len(nearest)==0:
    print("No devices found in the given range")
else:
    for dev in nearest:
        print("Found",dev)

print("Finished")
sbrl commented 5 years ago

@BNNorman @bsimmo: Hey, great discussion! I've been otherwise occupied with my summer project......

Indeed, some indexes would definitely speed up some of the queries the web interface runs.

I think that what we're talking about here is storing the device location as a POINT(lat, long) instead of in 2 separate columns.

I can definitely update the web interface to handle this.

The problem with using a QuadTree in the web interface is that for each request it would need to load the entire list of devices from the database, file them into a quad tree, and then use the QuadTree - which will probably negate any speed boosts we're likely to get from using it.

Hence, I'd very much like to solve the problem in pure SQL if possible. Converting dynamically fromt eh existing structure into a POINT(lat, long) is a good intermediate option, yeah.

As for the unit, I'm not sure what that is. It could be the distance in km squared.

Having a discussion about this on Thursday sounds like a great idea - I'll definitely be there.

BNNorman commented 5 years ago

A point column could be added. I think the sep columns are still needed unless there's a way to ' unpoint ' a point . The only hit on a new column is making sure we update it should the locations change.

BNNorman commented 5 years ago

It seems I need to learn about adding triggers to the devices & readings tables which will automatically update a Point column if the co-ords change.

sbrl commented 5 years ago

Yep, looks like there is a way to unpoint a point: https://mariadb.com/kb/en/library/point-properties/

SELECT X(location) as latitude, Y(location) as longitude FROM devices;

Rather than a trigger, a persistent computed column might be worth investigating.

BNNorman commented 5 years ago

Learning something new every day. PERSISTENT columns seems to fit the bill - the link you provided says they are always up to date. Nice! I'll have a play with that.

Oh, and I think I'd prefer to keep the columns devices_lon/lat so you don't need to change your existing API code which lists lat/lon on the device info page. It would also mean not changing dbLoader if we leave those alone. And the coords would be hand edited anyway whereas a blob (geometry) is not easy.

BNNorman commented 5 years ago

After some research the following will add a persistent Point() and index to the devices table.

alter table devices add column loc Point as (Point(device_latitude,device_longitude)) persistent;
alter table devices add spatial index loc_idx (loc(30))

I got the length of the field by doing this

select length(loc) from devices

which returned a value of 25 for all the existing devices. I don't know if this can be longer, so I opted for 30. I could experiment with lat/lon values to see if it ever changes - could be dependent on number of decimal places. It's giving me brain ache finding the size of a Point column - it's a blob and blobs can be 64k but surely a point would never be that long. Hey ho.

The same needs to be done to the readings table (which also has lat/lon to handle mobile device readings. (Future)

I'll discuss with Robin and SBRL then plan the addition.

sbrl commented 5 years ago

Looks great! It might be worth profiling some other queries with EXPLAIN and adding some other indexes in too - though I'd like to optimise the fetch-data HTTP API action's query a bit by refactoring it.

Why do you need the length of a point though? According the help article here all you need to do is provide the column name and it'll create the index automagically:

ALTER TABLE devices ADD SPATIAL INDEX(location);

Also, since the server is using InnoDB apparently spatial indexes are only supported in MariaDB 10.2+, and we're running 10.1. I can probably update it using an apt repository, but I don't have shell access to the database server.

Note that you can tell what storage engine we're using like this:

SELECT TABLE_NAME, ENGINE FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = "aq_db";
BNNorman commented 5 years ago

The point geometry is stored as a blob which can be upto 64k. When i tried to create an index without a size mariadb complained. I think theres a max length of index keys. I experimented with

Select length(point(x,y));

And they all came out as 25.

We could add an ordinary index for now.

BNNorman commented 5 years ago

If we do update mariaDb it should be 10.2.9 or later because of the bug reported in this: https://jira.mariadb.org/browse/MDEV-12462 (see last comment)

However, the date on that is 2017, yet my recent addition of mariaDb, this year, to my Pi is 10.1.37. I'll look into upgrading my Pi later.

Spatial indexes are 2D where as regular indexes are 1D. The spatial indexes should be faster when searching a geometry for a point. Currently we only have 14 devices so the improvement would be minimal IMHO.

sbrl commented 5 years ago

Ah, I see. Sounds like a plan then!

bsimmo commented 5 years ago

10.1.38 will be your update if you used apt. the shortly 'any month now'... release of Buster update will bring 10.3.15+

On Wed, 19 Jun 2019 at 08:06, Brian notifications@github.com wrote:

If we do update mariaDb it should be 10.2.9 or later because of the bug reported in this: https://jira.mariadb.org/browse/MDEV-12462 (see last comment)

However, the date on that is 2017, yet my recent addition of mariaDb, this year, to my Pi is 10.1.37. I'll look into upgrading my Pi later.

Spatial indexes are 2D where as regular indexes are 1D. The spatial indexes should be faster when searching a geometry for a point. Currently we only have 14 devices so the improvement would be minimal IMHO.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ConnectedHumber/Air-Quality-Web/issues/33?email_source=notifications&email_token=ACYAXN2AFVIVWMI4W7EGUNLP3HLF3A5CNFSM4HSDRGY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYA5BEA#issuecomment-503435408, or mute the thread https://github.com/notifications/unsubscribe-auth/ACYAXN6V7FWGECPCJ5S7EHTP3HLF3ANCNFSM4HSDRGYQ .

BNNorman commented 5 years ago

Thanks for that info.

sbrl commented 5 years ago

Actually, I was thinking of the official MariaDB apt repository, which provides the latest stable version of MariaDB which is 10.4+ @bsimmo

sbrl commented 5 years ago

All done - we now have list-devices-near.

I'm going to close this issue now, but if there are any further things, please do open a new one.