CityofPortland / pdxdata

Place for discussion about City of Portland data
15 stars 2 forks source link

Open Data: Metadata & Request Ideas #1

Open ungoldman opened 9 years ago

ungoldman commented 9 years ago

Continuing our conversation from https://github.com/maxogden/messages/issues/44

Lots of interesting possibilities around progressively opening up datasets listed on http://portlandmaps.com/metadata/index.cfm -- I'd like to open up that discussion here now that @ksmpdx has set up this spiffy new @CityofPortland github organization.

/cc @ksmpdx @catnik @maxogden @caged @derekwmiller @mquetel

ajturner commented 9 years ago

I recommend polling for what common search interfaces exist and considering how these can be harvested/federated into one or many portals/sites. There may be a PDX official 'one stop shop', but why not let the local brigade pull in the same dataset links, or even agencies pull in data from other departments into their data portal.

For example, ArcGIS Open Data supports DCAT 1.1 (CKAN, Data.gov), OpenSearch-Geo and Atom.

I'm an AtomPub-Fan myself, but DCAT is the standard du-jour that is common and maybe useful to build some simple search components to share.

caged commented 9 years ago

I'd love to see focus on identifying and opening datasets with high impact and relevance. I haven't thought about an objective way to identify those, but here are some I'd like to see:

mbabinski-zz commented 9 years ago

Hey all, I'm Micah Babinski, GIS Analyst for the Corporate GIS team at CoP. Thank you very much for your willingness to partner with us as we feel our way into the world of Open Data and improved metadata.

Among other things GIS-related, I have been working on our metadata improvement, enterprise data quality, and ArcGIS.com Open Data planning efforts. I'm somewhat limited because I am 2/3 of the way through a six month engagement working on some application support with our Risk Management department. Once July 1 hits, I should be able to increase my efforts in these areas considerably.

Most recently, I've scripted a metrics scheme that tallies how many of each of the following metadata elements are present for each item in the database:

  1. Title
  2. Category
  3. Bureau
  4. Abstract
  5. Purpose
  6. Contact
  7. Maintenance Update Frequency
  8. Keywords

This "content score" I then compare against a "usage score" which notes whether the datasets are:

  1. Used in the past week by City Staff in Desktop apps or Portland Maps
  2. Used in a layer file in our central layer file repository
  3. Used in Mapworks, a stand-alone ArcGIS Engine app that City Staff use
  4. Used in one of the Map Documents which are published on our on-premise ArcGIS Server

As a first pass, we identified those datasets which had a usage score of 4/4 and a content score of 3/8 or less. We then emailed the data stewards requesting more metadata content for those datasets and...crickets. A further step I'm proposing is a green/yellow/red color coding scheme for portlandmaps.com/metadata based on the content score.

We're also allocating some additional staff time to populate the portlandmaps metadata DB with information for legacy Portlandmaps pages and CivicApps, which has some narrative content that is currently lacking. We'll also do our best to populate based on our staff's knowledge of the subject matter. Bureau GIS leads will then be given an option to scroll through their datasets for which we've populated the metadata and approve our content, or change it.

A big part of my work has also involved removed stale and obviously obsolete datasets from our "GIS Hub". This has the advantage of decreasing clutter and decreasing the overall scale of the task of properly documenting everything. I've also developed a staleness score that looks to quantify those datasets which are exposed the least of all that we have on the Hub.

As for our ArcGIS.com Open Data site, I am still learning how to group the datasets and how I can efficiently convert metadata content out of the SQL Server DB where it is housed into the relevant components of the service that are propagated onto the Open Data site. I still have a long way to go before getting anything worth reviewing, but my first pass will be to make available those 50 or so GIS datasets we have accessible through CivicApps.

Ok, looking forward to working further with all of you on this!

Micah

ksmpdx commented 9 years ago

Quick request for feedback. BPS is moving forward on adding policies related to open City data into our proposed Comprehensive Plan. This is a win, but there is little turnaround time to review the wording. Here it is:

Policy 2.11 Open Data Planning and investment decisions are a collaboration of many stakeholders, including those listed in Policy 2.1. This collaboration is enhanced and opportunities for innovation and value creation are unlocked when:

Thoughts? Most interested in whether available without restriction or license makes sense given open data will be licensed (likely public domain).

Have to have final wording by midday tomorrow. Thanks!

caged commented 9 years ago

@ksmpdx I'm not a license expert (also IANAL), but I don't think no license is what you want to do. Specifically the second paragraph in the linked text above. Maybe you want Unlicense, CC0, or what OpenStreetMap and Oregon Metro use, OpenDbl.

@mbabinski sorry for missing your comment earlier. It sounds like a practical way to kick things off, but I wonder how historical datasets (many "frozen") will fair under this score. For example, older crime data files and shapefiles with historical boundaries.

ajturner commented 9 years ago

What @caged said - explicitly designate an unrestricted license. 'Unlicensed' and even 'public domain' is legally ambiguous and does not apply in some jurisdictions.

Best to choose at lease CC0. As a note Washington DC successfully got city council to approve use of CC0.

ajturner commented 9 years ago

Regarding open data exceptions - would these apply to the entire dataset or could (should ) it just apply to subsets. For example personal names redacted in Crime reports but the remaining attributes published.

Any concerns on timeliness? What if it takes a year to publish or update a dataset?

derekwmiller commented 9 years ago

@Caged and @ngoldman - you're both spot on. I think we'd all agree that it would be best to explicitly designate a license. There's support to do so... but that's not going to happen by tomorrow considering what it would take to make it happen... bureaucracy... city bureaus - including city attorney, elected officials, etc.

I think what @ksmpdx is asking is slightly different. Considering that the city isn't going to select a license by tomorrow - does the relevant text present a framework with which we can push the open agenda?

I would say remove 'available without restriction or license' ... leaves licensing options open when the time comes to implement the policy.

caged commented 9 years ago

Considering that the city isn't going to select a license by tomorrow - does the relevant text present a framework with which we can push the open agenda?

@derekwmiller, @ksmpdx I think so! I was immediately snagged by the license bit, but this is a great set of points to move forward with. I particularly love the "open by default," with the onus on the city agency as to why it shouldn't be. This would be a huge positive step. (More to follow...)

caged commented 9 years ago

I do have some questions concerning process.

derekwmiller commented 9 years ago

@Caged re: positive step. We think so too. Small step it may be, but moving forward nonetheless. Glad you agree.

re: denied release & challenge of. Great questions. I'm not aware of a discussion around those ideas, but I'll be pinging @ksmpdx to bring it up.

re: censoring. Think this gets sticky pretty fast. Would rely on the data steward censoring via a similar process to the 'release denial exception'. Crime data is a good example, but I struggle to think of others - so might not be such a big issue. Tax lot data (ownership) is uncensored, for example.

caged commented 9 years ago

The taxlot data is actually the dataset I've had issues with, as far as Metro is concerned (which may not be a concern of this group). I've understood (paraphrasing from memory) that Metro hasn't released the taxlot data because it contains ownership information and the fee they charge for RLIS ($480!) helps with their already tight budgets. There are a couple of things I would dispute about that:

If i'm interpreting your comments correctly, It seems like you're suggesting Portland will release the taxlot data, which is great! I think taxi data might be an interesting privacy case to consider.

derekwmiller commented 9 years ago

@Caged don't quote me on the release of taxlot data... wasn't exactly suggesting that the city would release it.

The reason I reference taxlot data as open is simply because you can access it uncensored via PortlandMaps as you illustrate in your link above (btw - check out the beta if you haven't done so - https://www.portlandmaps.com/beta/)

It gets tricky to release the dataset in its entirety bc it's not actually city data - it's county data - mostly Multnomah County, but also WashCo & Clackamas - so maybe that was a bad example.

Taxi data would be awesome, though I've never seen it for pdx. Sure you've all seen Chris Wong's awesome taxi app... thing rocks. http://nyctaxi.herokuapp.com/

ksmpdx commented 9 years ago

Moving discussion about licensing and access to https://github.com/CityofPortland/pdxdata/issues/3. Thanks @Caged, @ajturner & @derekwmiller for the feedback! Very helpful.