CartoDB / dataservices-api

The CARTO Data Services API
https://carto.com/docs/carto-engine/dataservices-api/
BSD 3-Clause "New" or "Revised" License
22 stars 14 forks source link

OBS_GetAvailableGeometries doesn't return results including normalize value #370

Closed nobuti closed 7 years ago

nobuti commented 7 years ago

First of all, not sure if this is a bug, but it's something I don't get it. During https://github.com/CartoDB/cartodb/issues/11619 implementation, and following the docs and this, we end having this query to get boundaries:

SELECT 
    * 
FROM OBS_GetAvailableGeometries(
    (SELECT ST_SetSRID(ST_Extent(the_geom), 4326) FROM (SELECT * FROM cdb.walmart_copy) q), 
    NULL,
    'us.census.acs.B01001026',
    'us.census.acs.B01003001',
    '2010 - 2014') denoms 
WHERE valid_numer IS TRUE 
    AND valid_denom IS TRUE 
    AND valid_timespan IS TRUE 
ORDER BY numgeoms DESC

The response to this query is something like this:

{"rows":[],"time":1.372,"fields":{"geom_id":{"type":"string"},"geom_name":{"type":"string"},"geom_description":{"type":"string"},"geom_weight":{"type":"string"},"geom_aggregate":{"type":"string"},"geom_license":{"type":"string"},"geom_source":{"type":"string"},"valid_numer":{"type":"boolean"},"valid_denom":{"type":"boolean"},"valid_timespan":{"type":"boolean"},"score":{"type":"number"},"numtiles":{"type":"number"},"notnull_percent":{"type":"number"},"numgeoms":{"type":"number"},"percentfill":{"type":"number"},"estnumgeoms":{"type":"number"},"meanmediansize":{"type":"number"},"geom_type":{"type":"string"},"geom_extra":{"type":"jsonb"},"geom_tags":{"type":"jsonb"}},"total_rows":0}

But taking a deep I discovered I removed a condition that apparently it doesn't make any sense:

SELECT 
    * 
FROM OBS_GetAvailableGeometries(
    (SELECT ST_SetSRID(ST_Extent(the_geom), 4326) FROM (SELECT * FROM cdb.walmart_copy) q), 
    NULL,
    'us.census.acs.B01001026',
    'us.census.acs.B01003001',
    '2010 - 2014') denoms 
WHERE valid_numer IS TRUE 
    AND valid_denom IS TRUE OR 'us.census.acs.B01003001' IS NULL
    AND valid_timespan IS TRUE 
ORDER BY numgeoms DESC

The part missing in the first request is OR 'us.census.acs.B01003001' IS NULL. And this query, in fact, returns some rows.

rafatower commented 7 years ago

@dgaubert could you please take a look?

rafatower commented 7 years ago

Excuse me the delay in the response. The first time I read it I did not quite understand what you meant.

The example query comes from instantiating this template (see the original boundaries section) "as is":

SELECT * FROM OBS_GetAvailableGeometries(
  ST_MakeEnvelope(-74, 41, -73, 40, 4326),
  NULL,
  <numer_id>,
  <denom_id>, 
  <timespan_id>
) denoms
WHERE valid_numer IS TRUE
 AND valid_denom IS TRUE OR <denom_id> IS NULL
 AND valid_timespan IS TRUE

So the reason of the apparently useless clause AND valid_denom IS TRUE OR <denom_id> IS NULL is to ignore validity of the denominator when there's no denominator involved (that is, when it is set to NULL).

In that example, the query from the template when denom_id = NULL would be something like this:

SELECT *
FROM OBS_GetAvailableGeometries(
    (SELECT ST_SetSRID(ST_Extent(the_geom), 4326) FROM (SELECT * FROM cdb.walmart_copy) q), 
    NULL,
    'us.census.acs.B01001026',
    NULL,
    '2010 - 2014') denoms 
WHERE valid_numer IS TRUE 
    AND valid_denom IS TRUE -- this should evaluate to FALSE
    AND valid_timespan IS TRUE 
ORDER BY numgeoms DESC

and that can yield no results, as opposed to the correctly instantiated:

SELECT *
FROM OBS_GetAvailableGeometries(
    (SELECT ST_SetSRID(ST_Extent(the_geom), 4326) FROM (SELECT * FROM cdb.walmart_copy) q), 
    NULL,
    'us.census.acs.B01001026',
    NULL,
    '2010 - 2014') denoms 
WHERE valid_numer IS TRUE 
    AND valid_denom IS TRUE OR NULL IS NULL -- this line evaluates to TRUE
    AND valid_timespan IS TRUE 
ORDER BY numgeoms DESC

To sum up: you shall stick to the original template with the OR <denom_id> IS NULL clause.

Side note: arguably OBS_GetAvailableGeometries() should be responsible for managing validity of the data returned, instead of delegating that responsibility into the caller. That would also reduce the amount of data transferred across User_DB <-> DS_API <-> OBS. I'll take that into consideration for future versions of https://github.com/CartoDB/observatory-extension

nobuti commented 7 years ago

Oh, it makes sense now!

rafatower commented 7 years ago

Sorry, @nobuti , I reopened it because I noticed there's a bug in the template: because of the precedence of the operators, parenthesis are mandatory.

Here's the correct template for that query:

SELECT * FROM OBS_GetAvailableGeometries(
  ST_MakeEnvelope(-74, 41, -73, 40, 4326),
  NULL,
  <numer_id>,
  <denom_id>, 
  <timespan_id>
) denoms
WHERE valid_numer IS TRUE
 (AND valid_denom IS TRUE OR <denom_id> IS NULL)
 AND valid_timespan IS TRUE

Otherwise it evaluates it like this:

valud_numer := FALSE
valid_denom := FALSE
denom_id := NULL
valid_timespan := TRUE

valid_numer IS TRUE AND valid_denom IS TRUE OR denom_id IS NULL AND valid_timespan IS TRUE
  -> (valid_numer IS TRUE AND valid_denom IS TRUE) OR (denom_id IS NULL AND valid_timespan IS TRUE)
  -> (FALSE AND FALSE) OR (TRUE AND TRUE)
  -> FALSE OR TRUE
  -> TRUE

which is incorrect.

rafatower commented 7 years ago

One question: what's the repo with the template?

nobuti commented 7 years ago

Do you mean this? https://github.com/CartoDB/cartodb/blob/d8614f137cab6980a9fe8fd18f5d6a833ec98a66/lib/assets/core/javascripts/cartodb3/data/data-observatory/boundaries-collection.js#L5-L11

rafatower commented 7 years ago

exaclty, thanks!

I created an issue to track the small change, hopefully to your benefit: https://github.com/CartoDB/cartodb/issues/12206

nobuti commented 7 years ago

Thanks @rafatower!