Nuanda / smogmapper

Smog Mapper
http://smogmapper.smogathon.pl/
1 stars 0 forks source link

Token and locations #34

Closed Nuanda closed 8 years ago

Nuanda commented 8 years ago

Work in progress.

Nuanda commented 8 years ago

@nowakowski Piotr, I need your help with that one. See measurements_controller#readings_json - you'll find a query there.

Since this PR introduces sensor location history (sensor has_many locations), the query is no longer valid. Please note this is THE Query ;) - the one that gets a single (?) reading across all sensors, within a given interval of time.

One problem here is that it may get more than 1 value, but I want to ask you to mend something else - to return the latitude and longitude of the reading.

So, a reading belongs_to a sensor, and the reading has 'time', and the sensor has_many locations, and a location has longitude and latitude.

Ideally, the query should return, together with 'reading.value', both 'longitude' and 'latitude' of the location of that sensor, that was "current" for this 'reading.time'. By "current" I mean its 'location.registration_time' is lower than 'reading.time', and no other location of this sensor has higher 'registration_time' that is still lower than 'reading.time'.

If such a subquery is not possible (or difficult), perhaps we may cheat a little bit and simply use 'to' time (the 1st method parameter) for location search as well.

I know that reading belongs_to location would solve that problem, but that relation is unfortunately not possible (for other reasons).

Nuanda commented 8 years ago

BTW I blocked the cache so the testing is simpler. We need to make it right first, then optimize.

nowakowski commented 8 years ago

@Nuanda What you're asking for would be quite simple if you replaced locations.registration_time with a pair of columns: locations.valid_from and locations.valid_until (both of type TIMESTAMP). Then you could simply write (note that we're skipping the sensors table altogether - you can join readings directly with locations! :)

SELECT r.*, l.longitude, l.latitude FROM readings r JOIN locations l ON r.sensor_id = l.sensor_id WHERE r.time BETWEEN l.valid_from and l.valid_until ORDER BY r.time DESC

This information can also be extracted from the existing data structure but it would require a more complicated query. If you're unwilling to introduce the abovementioned change -- let me know and I'll draft a smarter query for you. :)

Nuanda commented 8 years ago

This is, however, impossible. I mean, there is always the current location (the one with the latest registration_time for a given sensor) and it couldn't have valid_until (as it is not known, yet). I guess it won't work if valid_until is NULL.

Apart from that, this change would be acceptable, though it breaks a normal form (data redundancy - former location valid_until equals newer location valid_from).

Nuanda commented 8 years ago

By the way - it is possible that sensor has no locations, or there is no location for the given sensor that would suit the r.time. In this case, that particular reading should not be included in the result (as we don't know where to put it on the map, anyway) .

nowakowski commented 8 years ago

@Nuanda The NULL valid_until problem is easily circumvented - instead of r.time BETWEEN l.valid_from and l.valid_until use r.time BETWEEN l.valid_from and COALESCE(l.valid_until, '2999-12-31') (COALESCE accepts any number of arguments and returns the first argument which does not evaluate to NULL).

nowakowski commented 8 years ago

@Nuanda The inner join will automatically discard sensors for which no location can be retrieved (for the specified date range) so your second request is satisfied.

Nuanda commented 8 years ago

I understand, that keeping the current single date model would require comparing by that date only, then ordering and limiting number to 1, right?

nowakowski commented 8 years ago

@Nuanda Aye, you would need a subquery which, for each sensor_id, selects all locations with registration_time not later than reading.time, then sorts that in reverse order and picks the most recent record to join with readings.

Nuanda commented 8 years ago

Assuming we'll have only one location most of the time, per sensor, do you think this is going to be a big performance issue? As I told you, we are talking about getting 1 reading, out of (roughly) several thousands, for each sensor (out of several hundreds).

And yes, for this particular query, we don't need to fetch any data from the sensors table.

nowakowski commented 8 years ago

@Nuanda With 100K-1M readings - probably not a performance problem.

Nuanda commented 8 years ago

Then, are you able to write me that query in the measurements_controller#readings_json method? ;)

nowakowski commented 8 years ago

@Nuanda Sure, but I'm sort of busy right now - I'll get it done by Sunday.

Nuanda commented 8 years ago

@mkasztelnik Marek, can I ask you to look at smog_map coffee script and fix the ['locations'][0] shortcut there? This is related to #29. What we want is to show the current (i.e. the last when ordered by registation_time) location for each sensor. No-location sensors should be ignored and not shown on the map at all (which is obvious :)). Requires both server-side and client-side changes.

Would you be willing to do that?

mkasztelnik commented 8 years ago

Sure I can handle it. Should I wait for @nowakowski query or try to solve it (even using naive approach) by myself (then @nowakowski can improve query in next PR).

Nuanda commented 8 years ago

@mkasztelnik This is not the same query, so you can go ahead right away.

Nuanda commented 8 years ago

@nowakowski Piotr, will your query also solve issue #36 ?

nowakowski commented 8 years ago

@Nuanda Yeah, it should. I should have some time off tomorrow around noon - I'll work on the problem then.

nowakowski commented 8 years ago

@Nuanda Take a look at my latest commit. I wrote an efficient SQL query which retrieves the required data and wrote a spec for it (note, however, that it is a bit awkward to write specs for a private controller method - perhaps @mkasztelnik could offer advise on best practices in this regard, or perhaps - alternatively - the logic currently contained in measurements_controller_spec.rb could be moved to spec/requests). Also I'm not 100% sure if my JSON serialization is consistent with your expectations (though it's easy to fix if you need the data presented in a different manner).

mkasztelnik commented 8 years ago

@nowakowski you are right: private methods should never be tested, since then you are testing implementation not behavior. You should create integration spec similar to the one created already for heatmap: https://github.com/Nuanda/smogmapper/blob/master/spec/requests/heatmap_readings_spec.rb. What is more it would be nice to move complex query logic from the controller into e.g. model class similar like here: https://github.com/Nuanda/smogmapper/commit/8d0216ef26ae43b238062b5f4ea788cd4519ae65

nowakowski commented 8 years ago

@mkasztelnik But what if I want to test the implementation (in this case I kind of do...)?

mkasztelnik commented 8 years ago

@nowakowski implementation is an implementation, which can change over time (that is why you should not test it), but behavior should stay the same. You asked me about the advice - that is my advice :smile:. We can discuss this deeply f2f tomorrow not in this thread :)

nowakowski commented 8 years ago

@mkasztelnik Can do.

nowakowski commented 8 years ago

As per @mkasztelnik 's advice I removed the controller spec and instead wrote an API spec which tests the GET /measurements/show method. I tried to include all cases enumerated by @Nuanda but I am somewhat puzzled by lines 5-13 in measurements_controller.rb - perhaps we could discuss the intended function of this code fragment tomorrow?

mkasztelnik commented 8 years ago

@nowakowski I'm "guilty" of creating these lines. Basically there are 2 modes we have: current heatmap (where last readings should be returned - see #25 - click: "Pył - najnowsze pomiary") and heatmap from last 24 hours. The second case assume 15 minutes intervals (called "iterations") showed to the user in dynamic manner (click "Pył - animacja ostatnich 24 godzin"). The lines 5-13 handle these 2 cases.

mkasztelnik commented 8 years ago

@nowakowski in your specs you are doing following requests:

get "/measurements/show"

so params[:id] is set to show. There is no measurement with id equals to show. Introduced query does not take measurement type into account. It need to be corrected.

nowakowski commented 8 years ago

@mkasztelnik I assume that, when invoked with iteration=<some_numeric_value> the method will return a set of "historical" readings appropriate for the timestamp which is calculated as Time.now - (iteration*15.minutes). What I don't really understand is how to properly invoke the method in the first of the two "modes" you are describing. Why do you require a measurement_id as an argument at all? Wouldn't it be easier to just accept an optional Time object (with default == Time.now)?

nowakowski commented 8 years ago

@mkasztelnik Btw.: "Introduced query does not take measurement type into account." What is a "measurement type" and in what manner should I "take it into account"?

mkasztelnik commented 8 years ago

Currently we have 3 measurements: temperature, humidity and pm. When we are showing heatmap we are showing it for concrete measurement. That is why method: GET /measurements/:id should return 1 reading (if exists) for every sensor with measurement_id equals to params[:id]. Off course sensors can be moved between locations, that is why 1:n relation between sensor and location.

measurment_id was always used in the previous query, see: https://github.com/Nuanda/smogmapper/blob/master/app/controllers/measurements_controller.rb#L25. The 15 minutes interval is the time which will be set in physical sensor telling how often readings should be made. So when query /measurements/:id?interval=0 is made on 12:07 query should return last reading for all sensors recorded between 11:45-12:00, interval=1 11:30-11:45, etc. (see https://github.com/Nuanda/smogmapper/blob/master/app/controllers/measurements_controller.rb#L31-L42).

When no interval query params is set last reading for every sensor should be returned (if exist).

All tests (except cache tests right now) from https://github.com/Nuanda/smogmapper/blob/master/spec/requests/heatmap_readings_spec.rb should pass.

mkasztelnik commented 8 years ago

Edited previous comment because intervals was not correct.

nowakowski commented 8 years ago

@mkasztelnik OK, I get it. Btw., I suggest we rename measurements to measurement_types then (that's what threw me off - I thought that each measurement was a "package" of readings; the current name does not accurately reflect the intended purpose of the class).

Nuanda commented 8 years ago

The above is now corrected in #40. Please move all further work there.