Closed Nuanda closed 8 years ago
@nowakowski Piotr, I need your help with that one. See measurements_controller#readings_json - you'll find a query there.
Since this PR introduces sensor location history (sensor has_many locations), the query is no longer valid. Please note this is THE Query ;) - the one that gets a single (?) reading across all sensors, within a given interval of time.
One problem here is that it may get more than 1 value, but I want to ask you to mend something else - to return the latitude and longitude of the reading.
So, a reading belongs_to a sensor, and the reading has 'time', and the sensor has_many locations, and a location has longitude and latitude.
Ideally, the query should return, together with 'reading.value', both 'longitude' and 'latitude' of the location of that sensor, that was "current" for this 'reading.time'. By "current" I mean its 'location.registration_time' is lower than 'reading.time', and no other location of this sensor has higher 'registration_time' that is still lower than 'reading.time'.
If such a subquery is not possible (or difficult), perhaps we may cheat a little bit and simply use 'to' time (the 1st method parameter) for location search as well.
I know that reading belongs_to location would solve that problem, but that relation is unfortunately not possible (for other reasons).
BTW I blocked the cache so the testing is simpler. We need to make it right first, then optimize.
@Nuanda What you're asking for would be quite simple if you replaced locations.registration_time
with a pair of columns: locations.valid_from
and locations.valid_until
(both of type TIMESTAMP
). Then you could simply write (note that we're skipping the sensors table altogether - you can join readings directly with locations! :)
SELECT r.*, l.longitude, l.latitude FROM readings r JOIN locations l ON r.sensor_id = l.sensor_id WHERE r.time BETWEEN l.valid_from and l.valid_until ORDER BY r.time DESC
This information can also be extracted from the existing data structure but it would require a more complicated query. If you're unwilling to introduce the abovementioned change -- let me know and I'll draft a smarter query for you. :)
This is, however, impossible. I mean, there is always the current location (the one with the latest registration_time for a given sensor) and it couldn't have valid_until (as it is not known, yet). I guess it won't work if valid_until is NULL.
Apart from that, this change would be acceptable, though it breaks a normal form (data redundancy - former location valid_until equals newer location valid_from).
By the way - it is possible that sensor has no locations, or there is no location for the given sensor that would suit the r.time. In this case, that particular reading should not be included in the result (as we don't know where to put it on the map, anyway) .
@Nuanda The NULL valid_until
problem is easily circumvented - instead of r.time BETWEEN l.valid_from and l.valid_until
use r.time BETWEEN l.valid_from and COALESCE(l.valid_until, '2999-12-31')
(COALESCE accepts any number of arguments and returns the first argument which does not evaluate to NULL).
@Nuanda The inner join will automatically discard sensors for which no location can be retrieved (for the specified date range) so your second request is satisfied.
I understand, that keeping the current single date model would require comparing by that date only, then ordering and limiting number to 1, right?
@Nuanda Aye, you would need a subquery which, for each sensor_id
, selects all locations with registration_time
not later than reading.time
, then sorts that in reverse order and picks the most recent record to join with readings
.
Assuming we'll have only one location most of the time, per sensor, do you think this is going to be a big performance issue? As I told you, we are talking about getting 1 reading, out of (roughly) several thousands, for each sensor (out of several hundreds).
And yes, for this particular query, we don't need to fetch any data from the sensors table.
@Nuanda With 100K-1M readings - probably not a performance problem.
Then, are you able to write me that query in the measurements_controller#readings_json method? ;)
@Nuanda Sure, but I'm sort of busy right now - I'll get it done by Sunday.
@mkasztelnik Marek, can I ask you to look at smog_map coffee script and fix the ['locations'][0] shortcut there? This is related to #29. What we want is to show the current (i.e. the last when ordered by registation_time) location for each sensor. No-location sensors should be ignored and not shown on the map at all (which is obvious :)). Requires both server-side and client-side changes.
Would you be willing to do that?
Sure I can handle it. Should I wait for @nowakowski query or try to solve it (even using naive approach) by myself (then @nowakowski can improve query in next PR).
@mkasztelnik This is not the same query, so you can go ahead right away.
@nowakowski Piotr, will your query also solve issue #36 ?
@Nuanda Yeah, it should. I should have some time off tomorrow around noon - I'll work on the problem then.
@Nuanda Take a look at my latest commit. I wrote an efficient SQL query which retrieves the required data and wrote a spec for it (note, however, that it is a bit awkward to write specs for a private controller method - perhaps @mkasztelnik could offer advise on best practices in this regard, or perhaps - alternatively - the logic currently contained in measurements_controller_spec.rb
could be moved to spec/requests
). Also I'm not 100% sure if my JSON serialization is consistent with your expectations (though it's easy to fix if you need the data presented in a different manner).
@nowakowski you are right: private methods should never be tested, since then you are testing implementation not behavior. You should create integration spec similar to the one created already for heatmap: https://github.com/Nuanda/smogmapper/blob/master/spec/requests/heatmap_readings_spec.rb. What is more it would be nice to move complex query logic from the controller into e.g. model class similar like here: https://github.com/Nuanda/smogmapper/commit/8d0216ef26ae43b238062b5f4ea788cd4519ae65
@mkasztelnik But what if I want to test the implementation (in this case I kind of do...)?
@nowakowski implementation is an implementation, which can change over time (that is why you should not test it), but behavior should stay the same. You asked me about the advice - that is my advice :smile:. We can discuss this deeply f2f tomorrow not in this thread :)
@mkasztelnik Can do.
As per @mkasztelnik 's advice I removed the controller spec and instead wrote an API spec which tests the GET /measurements/show
method. I tried to include all cases enumerated by @Nuanda but I am somewhat puzzled by lines 5-13 in measurements_controller.rb
- perhaps we could discuss the intended function of this code fragment tomorrow?
@nowakowski I'm "guilty" of creating these lines. Basically there are 2 modes we have: current heatmap (where last readings should be returned - see #25 - click: "Pył - najnowsze pomiary") and heatmap from last 24 hours. The second case assume 15 minutes intervals (called "iterations") showed to the user in dynamic manner (click "Pył - animacja ostatnich 24 godzin"). The lines 5-13 handle these 2 cases.
@nowakowski in your specs you are doing following requests:
get "/measurements/show"
so params[:id]
is set to show
. There is no measurement with id
equals to show
. Introduced query does not take measurement type into account. It need to be corrected.
@mkasztelnik I assume that, when invoked with iteration=<some_numeric_value>
the method will return a set of "historical" readings appropriate for the timestamp which is calculated as Time.now - (iteration*15.minutes)
. What I don't really understand is how to properly invoke the method in the first of the two "modes" you are describing. Why do you require a measurement_id
as an argument at all? Wouldn't it be easier to just accept an optional Time
object (with default == Time.now
)?
@mkasztelnik Btw.: "Introduced query does not take measurement type into account." What is a "measurement type" and in what manner should I "take it into account"?
Currently we have 3 measurements: temperature
, humidity
and pm
. When we are showing heatmap we are showing it for concrete measurement. That is why method: GET /measurements/:id
should return 1 reading (if exists) for every sensor with measurement_id
equals to params[:id]
. Off course sensors can be moved between locations, that is why 1:n relation between sensor and location.
measurment_id
was always used in the previous query, see: https://github.com/Nuanda/smogmapper/blob/master/app/controllers/measurements_controller.rb#L25. The 15 minutes interval is the time which will be set in physical sensor telling how often readings should be made. So when query /measurements/:id?interval=0
is made on 12:07 query should return last reading for all sensors recorded between 11:45-12:00, interval=1 11:30-11:45, etc. (see https://github.com/Nuanda/smogmapper/blob/master/app/controllers/measurements_controller.rb#L31-L42).
When no interval query params is set last reading for every sensor should be returned (if exist).
All tests (except cache tests right now) from https://github.com/Nuanda/smogmapper/blob/master/spec/requests/heatmap_readings_spec.rb should pass.
Edited previous comment because intervals was not correct.
@mkasztelnik OK, I get it. Btw., I suggest we rename measurements
to measurement_types
then (that's what threw me off - I thought that each measurement
was a "package" of readings; the current name does not accurately reflect the intended purpose of the class).
The above is now corrected in #40. Please move all further work there.
Work in progress.