CartoDB / observatory-extension

BSD 3-Clause "New" or "Revised" License
6 stars 4 forks source link

REMOTE ERROR: Exception: Error trying to OBS_GetMeasure #189

Closed rochoa closed 7 years ago

rochoa commented 8 years ago

Batch Query result:

{
    "query": "BEGIN;DELETE FROM analysis_a08f3b6124_71b4ac83fbb6f0253c3cdb16a3f5a811bc778100;INSERT INTO analysis_a08f3b6124_71b4ac83fbb6f0253c3cdb16a3f5a811bc778100 SELECT\n  cartodb_id, the_geom, the_geom_webmercator, featurecla, adm1_code, diss_me, iso_3166_2, name, name_alt, code_local, code_hasc, region, region_big, gadm_level, abbrev, postal, labelrank, created_at, updated_at,\n  cdb_dataservices_client.OBS_GetMeasure(    the_geom,    'us.census.acs.B03002012'\n  ) AS hispanic_pop\nFROM (SELECT * FROM ne_50m_admin_1_states) AS _camshaft_do_measure_analysis;COMMIT;",
    "status": "failed",
    "started_at": "2016-08-25T10:54:24.462Z",
    "ended_at": "2016-08-25T11:00:08.693Z",
    "failed_reason": "cdb_dataservices_client._obs_getmeasure(7): [cartodb_user_a6f0d0fe-4f4f-4217-8150-22b0010fe409_db] REMOTE ERROR: Exception: Error trying to OBS_GetMeasure"
}

Table ne_50m_admin_1_states is from data library.

cc @rafatower @noguerol

rafatower commented 8 years ago

The interesting bits from the logs:

traceback removed

noguerol commented 8 years ago

@talos as far as i've checked this is happening only when augmenting polygon tables. I've tried with ign_spanish_adm2_provinces and ne_50m_admin_1_states with different dimensions and they seem not to work. Those dimensions work tho when augmenting points.

talos commented 8 years ago

Interesting. We had seen some issues with bad polygons before, and do a little bit of adjustment and buffering internally to avoid problems. However, it seems remarkable that all those table should fail. I'll look into it.

talos commented 8 years ago

The prior related issue and its resolution: https://github.com/CartoDB/observatory-extension/issues/160

talos commented 8 years ago

@noguerol Could you please give a few example measures that you were unable to use with polygons?

noguerol commented 8 years ago

For US race and etnicity>Hispanic population For Spain nationality>Persons who were born in Africa and Commerce and Economy>Supermarkets

talos commented 8 years ago

For both measures, the root of the problem is that we don't expose an option to choose a different boundary geometry for the aggregation.

That means we're using block groups to aggregate state populations in the US... this is very wasteful, and as you can see also error prone. We have larger geometries available, and using the API directly you could specify them. However, we should do this transparently (select the "best" geometry, which is not always the smallest). This is a known issue I've been working on, but was not yet reflected in this repo. I've opened a ticket: https://github.com/CartoDB/observatory-extension/issues/190

In Spain, we're using seccion censal, and again we should be using a larger geometry for adm2 level data. However, there the issue is compounded with us not having all measures at larger geometries. I've made an issue in the ETL to deal with this: https://github.com/CartoDB/bigmetadata/issues/91

In the interim, you would need to use the API functions to grab Observatory data at this level, when it is available. For the US:

UPDATE ne_50m_admin_1_states
SET hispanic_pop = OBS_GetMeasure(the_geom,
  'us.census.acs.B03002012', 
  null,
  'us.census.tiger.state', 
  '2010 - 2014')
noguerol commented 8 years ago

Indeed, I tried with smaller polys (ign_spanish_adm3_municipalities) and it worked.

Thanks for the explanation @talos

talos commented 8 years ago

No problem! I'll update this ticket once we've got improvements in to get these working -- these are great usecases and we should support them easily.

saleiva commented 8 years ago

Happening again when augmenting a census tract dataset with unemployment data.

ethervoid commented 8 years ago

We keep having this issue:

Error performing intersection: TopologyException: found non-noded intersection between LINESTRING (-80.4685 25.2301, -80.4686 25.2301) and LINESTRING (-80.4684 25.2302, -80.4686 25.2301) at -80.4685585652405 25.23009916119825

Dataset can be downloaded from https://team.carto.com/u/saleiva/dataset/fha_insurance_in_force_by_tract_complete_1

// @talos

talos commented 8 years ago

Thanks for the further report! We're working now on some major changes to default to the "best" geometry, but it requires quite a few changes under the hood. Once that's done, many of these geometry errors should be resolved: https://github.com/CartoDB/observatory-extension/issues/190

rafatower commented 7 years ago

is this solved?

talos commented 7 years ago

I'll look into this today. We do use the "best geometry" function now, and that may solve this case.

talos commented 7 years ago

Testing from the above, the Natural Earth states work -- slowly (80s), but they work -- without causing any exceptions. Using the trick with OBS_GetMeasure to specify the aggregation level (state) improves performance considerably.

I'm going to close this. Please let me know if you have examples still not working.

rochoa commented 7 years ago

Does that mean we need to change the usage of the function in order to specify the aggregation level?

talos commented 7 years ago

The function should be better now than it was before, in that it will choose an aggregation level for you that's bigger than the smallest one. This means the function is more reliable, should break less, and will generally be faster.

But if you're bringing in something through the API of a known geometry level (say counties) it's still to your advantage to specify the aggregation level by hand -- that way you know for sure the correct one is being used.