What is the data federation effort all about? What am I looking to get out of it?
This is a collaborative research project with GSA's Office of Products & Platforms and 18F. The goal is to build a toolkit / playbook for undertaking intra-governmental data collection / aggregation projects, such as data.gov, code.gov, and NIEM. Or goal is to find out what works, what doesn't, and what tools are appropriate for what circumstances, in order to accelerate similar efforts in the future.
Notes & Report will be public
Any questions before we get started?
What kinds of data does the city collect, and from whom?
city of philadelphia: parking authority data, school district data
opendataphilly.org: includes SEPTA, university data, yelp data from restaurants. Transity authority / school district/ courts not run by city.
many departments report: addresses, how they relate to parcels. Not every department agree on addresses or parcel addresses, leads to a lot of difficulty in matching records and leveraging data. Team in the IT dept (cityGEO) making a shared service called Address Information Service. Takes address and gives all city data related to that address. "Cities are inherently geographic entities, this comes up pretty often" Much better than trusting sources of record. Also require standard metadata from datasources from data sources from the public for opendataphilly.org.
Sometimes there's a big contrast in the day-to-day reality [of data standards] with realities on the ground: mainframes ranging to modern web apps. One example: really great if you could type in a property owners name and what addresses / contact info is. Then you look at property assessment system, and it's a mainframe where owner field has 16 characters. Sometimes truncated, sometimes spills into multiple fields. Sometimes just getting data out of a mainframe into RDB can bring you miles forward.
"many departments have been sharing data in an ad-hoc way for quite some time" Open data brings in more transparency and technology to.
opendataphilly standard: was a data inventory task a few years back. GIS group had been facilitating data sharing for several years, used that as a foundation. Then published to metadata.philly.gov. People fill in metadata in an application. data.json "wouldn't work here" - not every department has IT people. CKAN doesn't support data dictionaries. metadata.philly.gov includes datasets that are not shared publicly.
What was impetus or driving force for this efforts to collect data: policy, user needs, etc (perhaps after the first question)
In collecting data, what were the biggest challenges, and what went smoothly?
for keeping metadata.philly.gov up to date: trained them on how to use it, got feedback on what was confusing, added tips & notes. Were using arccatalog before, it's public, works in the browser. Still have to nudge & remind, but there's a process in place to check that the metadata is up to date before published. "Departments come to us pretty frequently asking us to publish data" "A lot of support for it" "sometimes it's publishing to the public, sometimes it's just sharing with other departments" Sometimes department staff fill out, sometimes they fill it out for them. Why publish to public? A lot of departments that do amazing work. Usually publishing data accompanied by visualizations & blog post / press release. Lots of times they get requests from the media / city council, reduces burden in that way.
"The easier you can make it to comply, the more people will comply" Generally people "get it" "It's more that everybody's got a million things going on" "The easier that you can make it... like making the UI streamlined for common workflows" "employing user experience design, where the users are data publishers."
What tools and technologies do you use for this effort?
opendataphilly - stored in cartodb . ETL for city databases to carto: python scripts, scheduled in in-house tool called taskflow (scheduler / background task runner). carto provides download links. CKAN hosts the links. Application builder Knack for metadata entry. SaaS product (like MS Access in the cloud).
Why did you choose this architecture or process. Were others tried, etc (after the "data aggregation/distribution" question)
"A result of trying it other ways and learning what the pain points were" "Constantly trying to consolidate, and simultaneously improve that infrastructure" used socrata before carto, published to github before that. Vizwit made to provide some visualization capability th. "Largely because we have been able to embrace open source software and publish open source software, it has given us a lot of flexiblity."
What are the political and organizational dynamics of collecting this data?
Who were the relevant stakeholders for this project, how were they identified and convened?
Is there anyone else I should speak with to better understand X?
Introductory comments
What kinds of data does the city collect, and from whom?
Sometimes there's a big contrast in the day-to-day reality [of data standards] with realities on the ground: mainframes ranging to modern web apps. One example: really great if you could type in a property owners name and what addresses / contact info is. Then you look at property assessment system, and it's a mainframe where owner field has 16 characters. Sometimes truncated, sometimes spills into multiple fields. Sometimes just getting data out of a mainframe into RDB can bring you miles forward. "many departments have been sharing data in an ad-hoc way for quite some time" Open data brings in more transparency and technology to.
What was impetus or driving force for this efforts to collect data: policy, user needs, etc (perhaps after the first question)
In collecting data, what were the biggest challenges, and what went smoothly?
for keeping metadata.philly.gov up to date: trained them on how to use it, got feedback on what was confusing, added tips & notes. Were using arccatalog before, it's public, works in the browser. Still have to nudge & remind, but there's a process in place to check that the metadata is up to date before published. "Departments come to us pretty frequently asking us to publish data" "A lot of support for it" "sometimes it's publishing to the public, sometimes it's just sharing with other departments" Sometimes department staff fill out, sometimes they fill it out for them. Why publish to public? A lot of departments that do amazing work. Usually publishing data accompanied by visualizations & blog post / press release. Lots of times they get requests from the media / city council, reduces burden in that way. "The easier you can make it to comply, the more people will comply" Generally people "get it" "It's more that everybody's got a million things going on" "The easier that you can make it... like making the UI streamlined for common workflows" "employing user experience design, where the users are data publishers."
What tools and technologies do you use for this effort?
opendataphilly - stored in cartodb . ETL for city databases to carto: python scripts, scheduled in in-house tool called taskflow (scheduler / background task runner). carto provides download links. CKAN hosts the links. Application builder Knack for metadata entry. SaaS product (like MS Access in the cloud).
Why did you choose this architecture or process. Were others tried, etc (after the "data aggregation/distribution" question)
"A result of trying it other ways and learning what the pain points were" "Constantly trying to consolidate, and simultaneously improve that infrastructure" used socrata before carto, published to github before that. Vizwit made to provide some visualization capability th. "Largely because we have been able to embrace open source software and publish open source software, it has given us a lot of flexiblity."
What are the political and organizational dynamics of collecting this data?
Who were the relevant stakeholders for this project, how were they identified and convened?
Is there anyone else I should speak with to better understand X?