Chicago / RSocrata

Provides easier interaction with Socrata open data portals http://dev.socrata.com. Users can provide a 'Socrata' data set resource URL, or a 'Socrata' Open Data API (SoDA) web query, or a 'Socrata' "human-friendly" URL, returns an R data frame. Converts dates to 'POSIX' format. Manages throttling by 'Socrata'.
https://CRAN.R-project.org/package=RSocrata
Other
233 stars 84 forks source link

Return computed regions #176

Open levyj opened 5 years ago

levyj commented 5 years ago

Socrata's Spatial Lens (most prominently used in map view filters) determines the geographic regions (e.g., Chicago Community Area) in which a point record falls. These regions are not presented in an easily understood way in either the grid view of a dataset or the /resource API but can be determined through a multi-step process. It would be great service if RSocrata could perform this process and return the calculated regions, if present.

As an example, see https://data.cityofchicago.org/Buildings/Building-Violations/22u3-xenr.

The /resource API shows an example record:

[
{
"id": "6274501",
"violation_last_modified_date": "2019-06-13T06:50:00.000",
"violation_date": "2019-06-13T00:00:00.000",
"violation_code": "CN193029",
"violation_status": "OPEN",
"violation_description": "WATCHMAN",
"violation_ordinance": "Maintain watchman from 4:00 PM to 8:00 AM for vacant and dangerous residential premises. (13-12-140)",
"inspector_id": "BL00943",
"inspection_number": "12953099",
"inspection_status": "CLOSED",
"inspection_waived": "N",
"inspection_category": "COMPLAINT",
"department_bureau": "DEMOLITION",
"address": "5301 S JUSTINE ST",
"street_number": "5301",
"street_direction": "S",
"street_name": "JUSTINE",
"street_type": "ST",
"property_group": "331440",
"latitude": "41.797602233",
"longitude": "-87.663320286",
"location": {
"latitude": "41.797602233217454",
"longitude": "-87.6633202858523",
"human_address": "{\"address\": \"\", \"city\": \"\", \"state\": \"\", \"zip\": \"\"}"
},
":@computed_region_vrxf_vc4k": "59",
":@computed_region_6mkv_f3dw": "14924",
":@computed_region_rpca_8um6": "37",
":@computed_region_bdys_3d7i": "790",
":@computed_region_43wa_7qmu": "2",
":@computed_region_awaf_s7ux": "19"
}
]

Note in particular:

":@computed_region_vrxf_vc4k": "59"

The /views API shows us under columns:

{
    "id" : 342479787,
    "name" : "Community Areas",
    "dataTypeName" : "number",
    "fieldName" : ":@computed_region_vrxf_vc4k",
    "position" : 31,
    "renderTypeName" : "number",
    "tableColumnId" : 60501607,
    "computationStrategy" : {
      "source_columns" : [ "location" ],
      "type" : "georegion_match_on_point",
      "parameters" : {
        "region" : "_vrxf-vc4k",
        "primary_key" : "_feature_id"
      }

So, that value is the Community Area but the value is not, as it might appear, Community Area 59. Instead, examine https://data.cityofchicago.org/dataset/Community-Areas/vrxf-vc4k. (Note the conversion of the underscore from the computed_region value to a hyphen.) The 59 refers to the record in this dataset with _feature_id 59, which turns out to be Community Area 61. (As I discovered in working through this example, the Feature IDs and Community Areas do match in many cases, which could lead people to think, incorrectly, that the :@computed_region_vrxf_vc4k is the Community Area number, itself.)

The final step is determining which column in this dataset shows the relevant value (Community Area, in this case). It should be fairly apparent to a person which column to use so, given the fairly small number of computed regions (types of regions, not the individual regions) likely used on a domain, it might be feasible to leverage that in some manner. However, there is an API. For the record, Socrata gave me the following warning, which I wish to record here:

Please note: Engineering emphasized that this is not an official API, so you are welcome to consult it but just know it's not officially supported as a source of truth for automated processes.

That said, if we consult https://data.cityofchicago.org/api/curated_regions and search for vrxf-vc4k, we see:

{
"id": 261,
"name": "Community Areas",
"createdAt": 1445869668,
"defaultFlag": true,
"enabledFlag": true,
"featurePk": "_feature_id",
"geometryLabel": "community",
"uid": "vrxf-vc4k",
"view": {
"id": "vrxf-vc4k",
"name": "Community Areas",
"averageRating": 0,
"createdAt": 1424310233,
"displayType": "table",
"downloadCount": 4,
"hideFromCatalog": true,
"hideFromDataJson": true,
"indexUpdatedAt": 1494641500,
"newBackend": true,
"numberOfComments": 0,
"oid": 10269628,
"provenance": "official",
"publicationAppendEnabled": false,
"publicationDate": 1424310240,
"publicationGroup": 2273835,
"publicationStage": "published",
"tableId": 2273835,
"totalTimesRated": 0,
"viewCount": 64,
"viewLastModified": 1494640995,
"viewType": "tabular",
"grants": [
{
"inherited": false,
"type": "viewer",
"flags": [
"public"
]
}

The item of interest is:

"geometryLabel": "community"

That is, in fact, the API field name from https://data.cityofchicago.org/dataset/Community-Areas/vrxf-vc4k indicating the Community Area, although it is worth noting that the value in this column for the above example is not 61 but NEW CITY, the name of the Community Area, rather than the number.

levyj commented 3 years ago

Just as a note, this issue is partly an invitation for someone to try to take on this feature request but at least as much a way to document the underlying structure. Especially as time passes, if anyone plans to rely on the information, whether to attempt the feature or for any other reason, it might be a good idea to confirm any critical portions with the RSocrata team and/or Socrata (https://github.com/socrata / https://dev.socrata.com/).