ConnectedHumber / Air-Quality-Web

The web interface and JSON api for the ConnectedHumber Air Quality Monitoring Project.
https://sensors.connectedhumber.org/
Mozilla Public License 2.0
11 stars 4 forks source link

Improving device-data-recent #37

Closed sbrl closed 5 years ago

sbrl commented 5 years ago

In #33, we implemented a new action, device-data-recent. While good, using ST_DISTANCE() results in an inaccurate sort - even over the small distances that we're handling here.

The solution to this is to take into account the curvature of the earth when sorting the devices by distance.

Doing so isn't trivial, but thankfully [this StackOverflow answer]() provides an implementation that does broadly what we're after:

CREATE FUNCTION `ST_DISTANCE_SPHERE`(`pt1` POINT, `pt2` POINT) RETURNS decimal(10,2)
BEGIN
    return 6371000 * 2 * ASIN(SQRT(
    POWER(SIN((ST_Y(pt2) - ST_Y(pt1)) * pi()/180 / 2),
    2) + COS(ST_Y(pt1) * pi()/180 ) * COS(ST_Y(pt2) *
    pi()/180) * POWER(SIN((ST_X(pt2) - ST_X(pt1)) *
    pi()/180 / 2), 2) ));
END
//
DELIMITER ;

Unfortunately, this is a stored function. While this will keep our code neater, we don't currently have permission to add it to the database (stored functions are stored at database-level), so we'll need @BNNorman's assistance here.

If we can add the above stored function to the database, we'll be able to swap out the ST_DISTANCE() call for ST_DISTANCE_SPHERE() instead.

CC @bsimmo.

BNNorman commented 5 years ago

Noted. Will look into setting this up.

bsimmo commented 5 years ago

That's Haversine formula.

You may want to simplify the maths (if php doesn't compile) to loose a few instructions. Also look at setting up pi()/180 as a variable so it not calculated every time (unless PHP sorts that out). Or just convert everything to radians at the start. I don't know what best for this.

On Sun, 23 Jun 2019, 9:33 pm Brian, notifications@github.com wrote:

Noted. Will look into setting this up.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ConnectedHumber/Air-Quality-Web/issues/37?email_source=notifications&email_token=ACYAXN4XQYWNTBGOZKOCMMLP37M2RA5CNFSM4H2ZWKTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYLGOQA#issuecomment-504784704, or mute the thread https://github.com/notifications/unsubscribe-auth/ACYAXN5DFG43ELP45VBCPGLP37M2RANCNFSM4H2ZWKTA .

sbrl commented 5 years ago

Thanks, @BNNorman!

@bsimmo: Good idea - we can pre-calculate some bits here to reduce some computational load. I'd be curious to know how much that'll actually help though, because surely the SQL query optimiser can figure that out?

Probably best to check?

BNNorman commented 5 years ago

I will rewrite this function with pi()/2 as a variable. It should , i believe, be declared as deterministic. I'm playing golf in the rain again tomorrow. Might have time to post the rewrite for your perusal before I leave. Will then look to implement it when i get back.

BNNorman commented 5 years ago

Does this float your boat?

DELIMETER //
CREATE FUNCTION `ST_DISTANCE_SPHERE`(`pt1` POINT, `pt2` POINT) RETURNS decimal(10,2) DETERMINISTIC
BEGIN
    DECLARE rad180 double;
    DECLARE rad360 double;

    SET rad180=pi()/180;
    SET rad360=rad180/2;

    return 12742000 * ASIN(SQRT(POWER(SIN((ST_Y(pt2) - ST_Y(pt1)) * rad360),2) + COS(ST_Y(pt1) * rad180) * COS(ST_Y(pt2) * rad180) * POWER(SIN((ST_X(pt2) - ST_X(pt1)) * rad360),2)));
END
//
DELIMETER ;

I have done this on the Pi and selected #37 as the current location - one of mine - you'll see #22 #24 and #26 are very close - because they are also mine.

image

It throws 13 data truncation warnings...do we care? Would it be better to return a double datatype?

image

If this gives you a warm glow I'll implement it on the server - this morning if you can respond before 8:30 otherwise when I get back from being drenched on a golf course (damn these pre-paid competitions)

sbrl commented 5 years ago

Hey, thanks @BNNorman! The only thing I'd be concerned about is another performance issue because of the warnings, but looking at the query time there I'm not sure that we do particularly care about such a warning (though it would be nice to squash it - especially since that will save the log files from filling up).

It looks fine to me - with or without returning a double. Thanks @BNNorman!

BNNorman commented 5 years ago

Would there be a problem returning a double? Otherwise could wrap a truncate() around the returned value.

sbrl commented 5 years ago

No problem at all, @BNNorman! So long as I can use it in an ORDER BY (which I'm pretty sure you can with a DOUBLE, it shouldn't matter.

I think returning a double probably the preferable option.

bsimmo commented 5 years ago

If you need a test location for a double check, use 53.750, -0.395 It the place I used as it near the middle of a lot of sensors (Derringham Bank roundabout/Calvert Lane iirc). And you get both sensors above and some to either side, which is were the problem came from.

For reference, Old order they came out at and the distance it should be,. 32 haversine 1997.86 m 36 haversine 1886.59 m 34 haversine 3823.82 m 35 haversine 2847.05 m 10 haversine 4206.41 m

BNNorman commented 5 years ago

Well this is what I get using Bens coords image

Wondering if the lat/lon are the correct way around as per:-

https://stackoverflow.com/questions/45216121/why-is-this-python-haversine-formula-producing-incorrect-answers

This isn't my field so experts please advise.

sbrl commented 5 years ago

Hrm. Not sure. That's rather odd that it works for your location but gives strange outputs for @bsimmo's - though we do only have ~2-3dp for those co-ordinates - maybe that's the issue?

I think you've got the ordering right - it should be (latitude, longitude).

Are those warnings still the truncation ones?

BNNorman commented 5 years ago

Warnings, yes. Double would have more dp. What i need is a pair or coords with a known distance between to prove the formula. Where did you get the original formula? The stack exchange solution uses atan, yours uses asin

Also here:- https://www.movable-type.co.uk/scripts/latlong.html

BNNorman commented 5 years ago

Some tests. First using the online calculator above image

Now using the ST_DISTANCE_FORMULA you gave me - big difference image

Now reversing the lat/lon values image

Conclusion - the ST_X and ST_Y values in your formula need reversing because we elected to store coords in the persistent field as lat-lon not lon-lat.

I will change the formula and retatest it later today.

bsimmo commented 5 years ago

For reference, mine comes from Python calculations, they are on the Forum. One is the standard Haversine formula These give the same as the Vincenty (using the result from the database that made me notice the problem) and also the more accurate again Karney formula (last one using a well worked on python module geopy). pip install geopy

In case you need it, this should get the list from the database via the api and compare to others.

import json
from urllib import request, parse
from math import radians, sin, cos, asin, sqrt
from geopy import distance

api_url = "https://aq.connectedhumber.org/api.php?action="

no_of_locations = 3
my_location = ( 53.750, -0.395 )

def get_nearest_sensors(my_location, no_of_locations):
    url = str(api_url) + "list-devices-near&count=" + str(no_of_locations)
    data = parse.urlencode({'latitude': my_location[0], 'longitude': my_location[1]}).encode()
    req = request.Request(url, data=data)
    response = request.urlopen(req)
    return response

def haversine(my_location , sensor_location ):
    lon1, lat1, lon2, lat2 = map(radians, [my_location[1], my_location[0], sensor_location[1], sensor_location[0]])
    dlon = lon2 - lon1
    dlat = lat2 - lat1
    a = sin(dlat / 2) ** 2 + cos(lat1) * cos(lat2) * sin(dlon / 2) ** 2
    return 2000 * 6371 * asin(sqrt(a))

for x in range(0,6):
    nearest_sensors = json.loads(get_nearest_sensors(my_location, x).read().decode('utf-8'))

    # print(nearest_sensors)

    for line in nearest_sensors:
        distance_away = round(line['distance_actual'],2)
        cdistance = round(line['distance_calc'],4)
        hdistance = round(haversine(my_location, ( line['latitude'],line['longitude'])),2)
        geodistance = distance.distance(my_location, ( line['latitude'], line['longitude']))

        print( line['id'], line['name'], "is", distance_away, "miles away", "cal dist = ",line['distance_calc'], " -- haversine", hdistance, "--geodesic", geodistance)# nearest_sensor_check = ('id','name', 10000.0)
        #print( line['id'], "haversine", hdistance, "m")
    print("----------")
BNNorman commented 5 years ago

Thanks Ben. I have created the stored function SBRL asked for - just swapped the ST_X,ST_Y and it delivers the correct value as per my earlier posting.

delimiter //
CREATE FUNCTION `ST_DISTANCE_SPHERE`(`pt1` POINT, `pt2` POINT) RETURNS double DETERMINISTIC
BEGIN
    DECLARE rad180 double;
    DECLARE rad360 double;

    SET rad180=pi()/180;
    SET rad360=rad180/2;

    return 12742000 * ASIN(SQRT(POWER(SIN((ST_X(pt2) - ST_X(pt1)) * rad360),2) + COS(ST_X(pt1) * rad180) * COS(ST_X(pt2) * rad180) * POWER(SIN((ST_Y(pt2) - ST_Y(pt1)) * rad360),2)));
END
//
delimiter ;
bsimmo commented 5 years ago

Just a reminder, You should be able to remove all the distance_actual calculation and field as you are calculating it directly so it's not needed.

Ben

On Wed, 26 Jun 2019, 2:26 pm Brian, notifications@github.com wrote:

Thanks Ben. I have created the stored function SBRL asked for - just swapped the ST_X,ST_Y and it delivers the correct value as per my earlier posting.

delimiter // CREATE FUNCTION ST_DISTANCE_SPHERE(pt1 POINT, pt2 POINT) RETURNS double DETERMINISTIC BEGIN DECLARE rad180 double; DECLARE rad360 double;

SET rad180=pi()/180; SET rad360=rad180/2;

return 12742000 ASIN(SQRT(POWER(SIN((ST_X(pt2) - ST_X(pt1)) rad360),2) + COS(ST_X(pt1) rad180) COS(ST_X(pt2) rad180) POWER(SIN((ST_Y(pt2) - ST_Y(pt1)) * rad360),2))); END // delimiter ;

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ConnectedHumber/Air-Quality-Web/issues/37?email_source=notifications&email_token=ACYAXN4536ELVYXI4NGRDE3P4NVBPA5CNFSM4H2ZWKTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYTQLXA#issuecomment-505873884, or mute the thread https://github.com/notifications/unsubscribe-auth/ACYAXN6NLMMJMCUW3SPZYBLP4NVBPANCNFSM4H2ZWKTA .

BNNorman commented 5 years ago

Not sure what you are referring to there Ben - what/where is this distance_actual?

bsimmo commented 5 years ago

No idea, is it reported as a field from the API, for get nearest_device. It uses a more compute intensive calculation to calculate distance.

Sbrl will know.

(I could look it up but that will be later)

On Wed, 26 Jun 2019, 6:34 pm Brian, notifications@github.com wrote:

Not sure what you are referring to there Ben - what/where is this distance_actual?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ConnectedHumber/Air-Quality-Web/issues/37?email_source=notifications&email_token=ACYAXNZ2JZOOEUKMQYDLCNDP4OSBXA5CNFSM4H2ZWKTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYUIRDY#issuecomment-505972879, or mute the thread https://github.com/notifications/unsubscribe-auth/ACYAXNZZI5GR7S33DAKU373P4OSBXANCNFSM4H2ZWKTA .

BNNorman commented 5 years ago

I see. Yes he'll be using something in SQL (ST_DISTANCE()??) but now I have uploaded ST_DISTANCE_SPHERE as a stored function he can use that.

sbrl commented 5 years ago

Hey, thanks @BNNorman! When I have a moment I'll implement that change.

Yeah, if there's a suitably small distance between distance_actual (calculated in PHP with a library) and distance_calc (calculated via SQL), I'll remove distance_actual.

sbrl commented 5 years ago

Just looked into it, and it would work.... except that my MariaDB user account (that's also used for the web interface) doesn't have permission:

PDOException: SQLSTATE[42000]: Syntax error or access violation: 1370 execute command denied to user 'sbrl'@'%' for routine 'aq_db.ST_DISTANCE_SPHERE'

I've committed the update in e86b0b4f97fcc21d0831ee4b865b953503c79ea9, so it'll start working on the dev branch as soon as the permissions issue is fixed.

This time I'm going to roll up a bunch of features over ~1-2 weeks before pushing them all out at once - with a beta testing period before release should beta.aq.connectedhumber.org be a thing.

@BNNorman

bsimmo commented 5 years ago

10meters max, probaby, worst case over Hull. Not plucked out of the air but from my tests. That is suitably small.

On Wed, 26 Jun 2019, 8:46 pm Starbeamrainbowlabs, notifications@github.com wrote:

Hey, thanks @BNNorman https://github.com/BNNorman! When I have a moment I'll implement that change.

Yeah, if there's a suitably small distance between distance_actual (calculated in PHP with a library) and distance_calc (calculated via SQL), I'll remove distance_actual.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ConnectedHumber/Air-Quality-Web/issues/37?email_source=notifications&email_token=ACYAXNZ724PLL7D6XPIZAXDP4PBS7A5CNFSM4H2ZWKTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYUTY2I#issuecomment-506018921, or mute the thread https://github.com/notifications/unsubscribe-auth/ACYAXN6KQENXIOSOCNHW47LP4PBS7ANCNFSM4H2ZWKTA .

BNNorman commented 5 years ago

Original code was 2dp - that's 10m

BNNorman commented 5 years ago

@sbrl - I'll look into the execute permissions thing tomorrow.

BNNorman commented 5 years ago

Ok added execute permission for you. Please try it.

sbrl commented 5 years ago

Ah, ok. Sounds like we'll be able to ditch distance_actual after all! I'll probably rename distance_calc to just simply distance then.

@BNNorman: Thanks so much! I'll try it when I get a moment.

sbrl commented 5 years ago

Looks like it works fabulously, @BNNorman :smiley_cat:

In the interests of stability though, I'm not going to immediately release this. As I've mentioned before, I'll roll up a few other changes, have a beta testing period, and then release to stable.

Releases won't be regular (i.e. on a timed basis), but I do hope to have a steady stream of updates coming in batches.

BNNorman commented 5 years ago

Any timing improvement?

sbrl commented 5 years ago

Not sure there's a noticeable difference between ST_DISTANCE() and ST_DISTANCE_SPHERE() at the moment.

BNNorman commented 5 years ago

ST_DISTANCE() returns the angle between the two points so you'd need to factor in the rad/dia of the Earth which might add more micros.

But ST_DISTANCE_SPHERE() is more correct.

sbrl commented 5 years ago

Right. At the moment it shouldn't be an issue. If there's a performance issue later, I'll tackle it then.

I'm closing this issue for now since we've implemented the fix. Feel free to open a new issue if problems continue to occur.