Closed rochoa closed 7 years ago
The interesting bits from the logs:
traceback removed
@talos as far as i've checked this is happening only when augmenting polygon tables. I've tried with ign_spanish_adm2_provinces and ne_50m_admin_1_states with different dimensions and they seem not to work. Those dimensions work tho when augmenting points.
Interesting. We had seen some issues with bad polygons before, and do a little bit of adjustment and buffering internally to avoid problems. However, it seems remarkable that all those table should fail. I'll look into it.
The prior related issue and its resolution: https://github.com/CartoDB/observatory-extension/issues/160
@noguerol Could you please give a few example measures that you were unable to use with polygons?
For US race and etnicity>Hispanic population For Spain nationality>Persons who were born in Africa and Commerce and Economy>Supermarkets
For both measures, the root of the problem is that we don't expose an option to choose a different boundary geometry for the aggregation.
That means we're using block groups to aggregate state populations in the US... this is very wasteful, and as you can see also error prone. We have larger geometries available, and using the API directly you could specify them. However, we should do this transparently (select the "best" geometry, which is not always the smallest). This is a known issue I've been working on, but was not yet reflected in this repo. I've opened a ticket: https://github.com/CartoDB/observatory-extension/issues/190
In Spain, we're using seccion censal, and again we should be using a larger geometry for adm2 level data. However, there the issue is compounded with us not having all measures at larger geometries. I've made an issue in the ETL to deal with this: https://github.com/CartoDB/bigmetadata/issues/91
In the interim, you would need to use the API functions to grab Observatory data at this level, when it is available. For the US:
UPDATE ne_50m_admin_1_states
SET hispanic_pop = OBS_GetMeasure(the_geom,
'us.census.acs.B03002012',
null,
'us.census.tiger.state',
'2010 - 2014')
Indeed, I tried with smaller polys (ign_spanish_adm3_municipalities) and it worked.
Thanks for the explanation @talos
No problem! I'll update this ticket once we've got improvements in to get these working -- these are great usecases and we should support them easily.
Happening again when augmenting a census tract dataset with unemployment data.
We keep having this issue:
Error performing intersection: TopologyException: found non-noded intersection between LINESTRING (-80.4685 25.2301, -80.4686 25.2301) and LINESTRING (-80.4684 25.2302, -80.4686 25.2301) at -80.4685585652405 25.23009916119825
Dataset can be downloaded from https://team.carto.com/u/saleiva/dataset/fha_insurance_in_force_by_tract_complete_1
// @talos
Thanks for the further report! We're working now on some major changes to default to the "best" geometry, but it requires quite a few changes under the hood. Once that's done, many of these geometry errors should be resolved: https://github.com/CartoDB/observatory-extension/issues/190
is this solved?
I'll look into this today. We do use the "best geometry" function now, and that may solve this case.
Testing from the above, the Natural Earth states work -- slowly (80s), but they work -- without causing any exceptions. Using the trick with OBS_GetMeasure
to specify the aggregation level (state) improves performance considerably.
I'm going to close this. Please let me know if you have examples still not working.
Does that mean we need to change the usage of the function in order to specify the aggregation level?
The function should be better now than it was before, in that it will choose an aggregation level for you that's bigger than the smallest one. This means the function is more reliable, should break less, and will generally be faster.
But if you're bringing in something through the API of a known geometry level (say counties) it's still to your advantage to specify the aggregation level by hand -- that way you know for sure the correct one is being used.
Batch Query result:
Table
ne_50m_admin_1_states
is from data library.cc @rafatower @noguerol