What is the data federation effort all about? What am I looking to get out of it?
This is a collaborative research project with GSA's Office of Products & Platforms and 18F. The goal is to build a toolkit / playbook for undertaking intra-governmental data collection / aggregation projects, such as data.gov, code.gov, and NIEM, where data is collected from entities over which you do not have direct authority. We call these federated data efforts. Or goal is to find out what works, what doesn't, and what tools are appropriate for what circumstances, in order to accelerate similar efforts in the future.
Notes & Report will be public
Any questions before we get started?
What experience do you have that might be relevant to this effort? (e.g., work with open data standards, participation in gov data collection across organizational boundaries, etc.)
CDO for state of Connecticut
1/2 time running open data program, 1/2 time ways to better use data. Works in office of policy and management (equivalent of OMB). Doesn't work in IT agency or report to CIO. Other state CDOs more focused on data warehouses etc., Tyler focuses more on Analytics. They collect a lot of data from municipalities. They pull data from 169 individual municipalities. Property tax data, municipal spending, etc.. 1-2 years ago became interested in state-federal relationship. data collected from municipalities.
"probably a better way to do this, don't know what that better way is"
"pretty interested in this flow of data from the state to the federal government."
previous: recovery act, federal agencies pump more to state programs.
Data collected from municipalities:
challenges / what went well: real estate report, aggregated from municipality. getting data is easy. 10 categories: residential, commercial, industrial, vacant, etc. One town may consider apartment building as residential, another considers them apartments. Still have paper based reporting systems - small towns don't have full time person working there.
uniform crime reports example- some departments still using triplicate carbon paper.
What was impetus or driving force for this effort: policy, user needs, etc (perhaps after the first question)
90% of the time, state or federal law. Mandate reporting of data to states from municipalities. Some cities struggling, some effort to monitor fiscal health of cities. Requested data from towns, got some resistance. Implemented law that would allow us to get timely information. Big problem: reporting only at end of fiscal year, then data is not timely enough to be useful.
In building X, what were the biggest challenges, and what went smoothly?
smoothly: financial incentives or support work. example: provided grants to get municipalities to get off quickbooks, got great compliance rates. Another example: grants to develop parcel data, if you take the money you must use our data and give us the data back.
for reporting data up to feds:
"as far as I can tell it's a fairly smooth process" . Usually federal software provided & funded. For example - state drinking water. Use fed provided software as database. But it's "trapped" in federal systems. Hypothesis: if we had a better way to make the data more readily available as open data, or access-controlled tables on web.
What tools and technologies do you use for this effort?
"it's all over the place" - you can email, there's a "portal" to upload. More sophisticated systems where you upload CSV / excel. Standardization - either spelled out in statute or (in more successful models) requires agency to come up with standard, usually focused on ontology, usually done through considerable stakeholder engagement. Iterative process. Draft / feedback / etc.
QA / QC after aggregation. Compare to year before, is something way different? Lots of knowledgeable people, things jump out at them. Some work before it's suitable for analysis. Vast majority is for mandated annual report. More recently goes into dashboard etc online.
Why did you choose this architecture or process. Were others tried, etc (after the "data aggregation/distribution" question)
What are the political and organizational dynamics of collecting this data?
broadly, a lot of the people that are ultimately responsible for submitted / collecting data are not technologists, not thinking about other uses for the data might be out there, getting people to think outside the bounds of the exact report they're creating is challenging. Often there's a mandate, but no carrots or sticks. A lot of reluctant participation. Often done begrudgingly.
Who were the relevant stakeholders for this project, how were they identified and convened?
Is there anyone else I should speak with to better understand X?
possibly other state CDOs
Michigan has a recent law to collect data from municipalities.
What efforts are you aware of that fit this category?
Do you have contacts in those efforts who we could reach out to?
What do you think are the primary challenges of these types of efforts?
Would an open source toolkit help?
"No shortage of tech based tools where you can load a spreadsheet in this column aligns with this column in this database" (e.g., basic ETL). Perhaps process for uploading & running basic validations. If some of the data were just more available, that might help.
Introductory comments
What experience do you have that might be relevant to this effort? (e.g., work with open data standards, participation in gov data collection across organizational boundaries, etc.)
"probably a better way to do this, don't know what that better way is" "pretty interested in this flow of data from the state to the federal government."
previous: recovery act, federal agencies pump more to state programs.
Data collected from municipalities:
challenges / what went well: real estate report, aggregated from municipality. getting data is easy. 10 categories: residential, commercial, industrial, vacant, etc. One town may consider apartment building as residential, another considers them apartments. Still have paper based reporting systems - small towns don't have full time person working there.
uniform crime reports example- some departments still using triplicate carbon paper.
What was impetus or driving force for this effort: policy, user needs, etc (perhaps after the first question)
90% of the time, state or federal law. Mandate reporting of data to states from municipalities. Some cities struggling, some effort to monitor fiscal health of cities. Requested data from towns, got some resistance. Implemented law that would allow us to get timely information. Big problem: reporting only at end of fiscal year, then data is not timely enough to be useful.
In building X, what were the biggest challenges, and what went smoothly?
smoothly: financial incentives or support work. example: provided grants to get municipalities to get off quickbooks, got great compliance rates. Another example: grants to develop parcel data, if you take the money you must use our data and give us the data back.
for reporting data up to feds: "as far as I can tell it's a fairly smooth process" . Usually federal software provided & funded. For example - state drinking water. Use fed provided software as database. But it's "trapped" in federal systems. Hypothesis: if we had a better way to make the data more readily available as open data, or access-controlled tables on web.
What tools and technologies do you use for this effort?
"it's all over the place" - you can email, there's a "portal" to upload. More sophisticated systems where you upload CSV / excel. Standardization - either spelled out in statute or (in more successful models) requires agency to come up with standard, usually focused on ontology, usually done through considerable stakeholder engagement. Iterative process. Draft / feedback / etc.
QA / QC after aggregation. Compare to year before, is something way different? Lots of knowledgeable people, things jump out at them. Some work before it's suitable for analysis. Vast majority is for mandated annual report. More recently goes into dashboard etc online.
Why did you choose this architecture or process. Were others tried, etc (after the "data aggregation/distribution" question)
What are the political and organizational dynamics of collecting this data?
broadly, a lot of the people that are ultimately responsible for submitted / collecting data are not technologists, not thinking about other uses for the data might be out there, getting people to think outside the bounds of the exact report they're creating is challenging. Often there's a mandate, but no carrots or sticks. A lot of reluctant participation. Often done begrudgingly.
Who were the relevant stakeholders for this project, how were they identified and convened?
Is there anyone else I should speak with to better understand X?
What efforts are you aware of that fit this category?
Do you have contacts in those efforts who we could reach out to?
What do you think are the primary challenges of these types of efforts?
Would an open source toolkit help?
"No shortage of tech based tools where you can load a spreadsheet in this column aligns with this column in this database" (e.g., basic ETL). Perhaps process for uploading & running basic validations. If some of the data were just more available, that might help.