BathHacked / forum

Raise issues in this to act as a rudimentary forum and TODO list for our projects
0 stars 0 forks source link

KMLs #3

Open bathnesresearch opened 10 years ago

bathnesresearch commented 10 years ago

Been having some real pain with the KML formatting issue. Turns out that it's an issue with formatting in our propriatory GIS system, as suspected. KML only likes a Name and Description and Z field; with other fields either being represented as HTML within the Description. At the moment the solution I've found is to manually push it through QGIS, manually allocate two field names to Name and Description. Worked example is: https://data.bathhacked.org/dataset/Lower-Super-Output-Areas/mq6n-3m8u

This works for rarely changing files (the above has about a decade of currency), but is going to be a massive resource hit if we have to do this regularly (to say nothing that qGIS isn't a supported system for the council)

Second issue is that we lose any contextual info - so for example the car parks data layer would lose opening times, etc. If we treat them as two distinct files, (so for the example I just posted, then associated data would look like - https://data.bathhacked.org/dataset/Lsoa-And-Ward-Population/h4my-ktvn). I can't find any way in Socrata to get the name field from the KML treated as a linking variable with the code in the population data, so this will likely be an issue if we did this in other instances.

Found this example from elsewhere - https://data.raleighnc.gov/Census/Average-Household-Size-by-Block-Group/wrj6-y6ck/widget_preview?height=425&variation=2fxh-wcp7&width=500 - but it's quite buggy and I can't work out how they've forced the link.

There may be some hacks we can do for this, but it seems to make a difficult rod for our backs in terms of maintaining currency; I don't know if anyone's got any thoughts; or whether we need to be investigating a more flexible GIS format to publish in.

markowen commented 10 years ago

I'm beginning to think that Socrata just treats KML as blobs, with a special treatment in the presentation layer. My current leaning is towards separating geometry from data. This loses us simple wins for visualisations within native Socrata but buys us some richer opportunies by mashing up datasets via the API.

WebUnknown commented 10 years ago

This has worried me too but could we just simplify the problem? Let's avoid KML unless it's fairly static spatial data; it's a sucky format for rich data anyway. For point-based datasets (e.g. schools, hospitals, Jack's bike shops) we'll get a more useful result by simply publishing rich data in a tabular format, plus lat/lng columns. Socrata plays quite happily with these.

So: Can the council publish point based datasets as a good ole fashioned CSV with lat/lng etc?

PS. I'm thinking we may be expecting too much from data store visualisations. They're okay with basic KML, colourful charts and dots on maps but hooking rich data to boundaries seems to be stretching things.

ldodds commented 10 years ago

Geographic datasets tend to fall into two categories:

For POI data as @WebUnknown suggests, it would be better to upload those into Socrata as tabular data with lat/longs plus any extra metadata, e.g. name, description, opening times, etc. That's how I published the grit bin dataset. Its easy to regenerate a KML or GeoJSON file from these.

For boundaries, given that Socrata doesn't really handle them well, they may as well just be published as static files to a website. In Socrata we could just have a dataset that provides a collection of pointers to the different boundary layers available, and app can then use that to cache the data or load it dynamically into Google Maps/Earth.

If its tricky to get POI data turned into CSV or something from the council applications, then I'd suggest we have some code to extract the relevant points from the KML to generate a Socrata dataset. That might help simplify things from the council end. Again, that's essentially what I did for the grit bins, I parsed the GeoJSON from isharemaps and created a CSV.

So maybe we can set up a Socrata "Geo Dataset" that has:

That will provide a way to discover/index the various datasets. We can then write scrapers to turn any KML with a type of POI into a separate Socrata dataset.

What do you all think?

jackmcconnell commented 10 years ago

I noticed Socrata has a 'Link to External Dataset' option when uploading data - maybe we could use that to link to the KMLs?

WebUnknown commented 10 years ago

Leigh's option gets my vote - ticks all the boxes.

markowen commented 10 years ago

Totally agree with you Leigh. That's pretty much the plan we came up with at the last round table.

I'd like to see the POIs injected into tables and a link back to the original KML file. If the KML is clean, packages like OpenLayers can handle the files quite neatly rather than us blatting a load of points into them.

Can I also suggest an "update frequency" field for the Geo Dataset? That would cover some of the issues we talked about.

ldodds commented 10 years ago

@markowen Adding Update Frequency makes sense.

For the link back to the original KML, we could use the source attribute on the POI dataset to point to the KML?

bathnesresearch commented 10 years ago

That's either already there, or props to someone for fast moving!

I'll need to check with both IT and GIS about the feasibility for autofilling lat/long into csv point data. Intuitively I think this should be achievable and automated, so works for me too.

Boundary data could be a little more complex, as there are very often way more than 2 fields of contextual data held within the host file, this'll create a bit of a process for each new spatial set as we try and work out which bits are relevant. The way our Mapping kit often works is to consider the spatial elements simply another field of the main data set. I can't quite get my head around how big of a challenge this will be, so I'm probably overthinking.

As I've mentioned before, we are very interested in any kit that can manage that linking boundary data to associated data sets for the purposes of creating thematic (mainly chloropleth, but point density also useful) visualisations in a way that's a bit more elegant than pushing things through Fusion tables. We can resource some of this work to make it enterprise-relevant.

bathnesresearch commented 10 years ago

@markowen From a practical file conversion perspective, it will be much trickier to produce both file types, but I'll speak to Ally in GIS