What is the data federation effort all about? What am I looking to get out of it?
This is a collaborative research project with GSA's Office of Products & Platforms and 18F. The goal is to build a toolkit / playbook for undertaking intra-governmental data collection / aggregation projects, such as data.gov, code.gov, and NIEM. Or goal is to find out what works, what doesn't, and what tools are appropriate for what circumstances, in order to accelerate similar efforts in the future.
Notes & Report will be public
Any questions before we get started?
What is X, in your own words?
The issue is directory information about human services. Offering open standards / interoperable APIs / new practices and business models that promote the provision of this data as a public. Information about resources for people in need to be like a public utility. Use case that inspired early on: publish resource data googlable. Google introduced civic services for schema.org. Don't have incentives to publish the right. Dealing with info not collected by governments. Call centers, publishing data on webpage. Being the bridge between closed formats and published data. OpenReferral is the exchange layer, schema.org provides publishing layer. Adopt openreferral standard, then easy to publish schema.org, share with other organizations.
What was impetus or driving force for this effort: policy, user needs, etc (perhaps after the first question)
He saw these needs popping up over and over again in DC. Saw trend: market failure, more and more apps provide less and less trustworthy information. Everyone competing to be "the one." If one of the private companies succeeds, that's also bad! Then they own the public information. "We need public goods and infrastructure before competition." "If you don't have public goods and infrastructure, you are cruising for various bruisings."
In building X, what were the biggest challenges, and what went smoothly?
Biggest challenge: cultural. complex, long term work, multi-stakeholder collaboration. We want quick, short term, linear-to-scale wins. Mismatch between expectations and reality. "Difficult to get people to invest resources in things like infrastructure rather than shiny apps." Other cultural dimension: treated like a market. Instead of investing, "just build the best moustrap." "skepticism of the value of cooperation." "What we're finding is that that can be unlearned." "We can rediscover that value of cooperative logic... it takes a first mover, a second moved, but once people are collaborating, the logic is hard to beat." "It requires that investment of resources in order to align incentives around that value proposition."
Gone well: "I don't think the world needs more apps, but we have several apps that are built around the standard." "Having tools that everyone needs that they can plug & play has worked." Community built (Phil, Open Data UK, etc).
In interests to do this, but they don't have the capacity to do the things that will save them capacity. Simply making the case is not enough, you need to make the case and then get additional resources to come.
What has gone well: found funding that was allocated to doomed projects and able to demonstrate value. (took many years).
What tools and technologies do you use for this effort?
Tech around standard: started with google docs, now on github with readthedocs documentation, using hypothesis for commenting. Standard itself: vocabulary, logical model, formatting instructions. formatting: csv. Objective: be simple for folks to open up and edit. But domain is more complex than GTFS. e.g., one institution with multiple locations. Then json data package + csvs. Now they have an open API spec. Next chapter of evolution: likely API-first with json / csv as one formatting option. They're not harvesting the data. Hypothesis 1: self publishing (peer-to-peer). Hypothesis 2: centralized hub for community (one-to-many). Average area: 1-2 employees, how to recoup the cost of the data if it should be available for free? Hypothesis 3: many-to-many federation. Many intermediaries. Different intermediaries can share data in a federated network.
Why did you choose this architecture or process. Were others tried, etc (after the "data aggregation/distribution" question)
What are the political and organizational dynamics of collecting this data?
improve service delivery, reduce cost of maintaining information, improve ability for decision makers to assess the allocation of resources against community needs.
Who were the relevant stakeholders for this project, how were they identified and convened?
Is there anyone else I should speak with to better understand X?
Introductory comments
What is X, in your own words?
The issue is directory information about human services. Offering open standards / interoperable APIs / new practices and business models that promote the provision of this data as a public. Information about resources for people in need to be like a public utility. Use case that inspired early on: publish resource data googlable. Google introduced civic services for schema.org. Don't have incentives to publish the right. Dealing with info not collected by governments. Call centers, publishing data on webpage. Being the bridge between closed formats and published data. OpenReferral is the exchange layer, schema.org provides publishing layer. Adopt openreferral standard, then easy to publish schema.org, share with other organizations.
What was impetus or driving force for this effort: policy, user needs, etc (perhaps after the first question)
He saw these needs popping up over and over again in DC. Saw trend: market failure, more and more apps provide less and less trustworthy information. Everyone competing to be "the one." If one of the private companies succeeds, that's also bad! Then they own the public information. "We need public goods and infrastructure before competition." "If you don't have public goods and infrastructure, you are cruising for various bruisings."
In building X, what were the biggest challenges, and what went smoothly?
Biggest challenge: cultural. complex, long term work, multi-stakeholder collaboration. We want quick, short term, linear-to-scale wins. Mismatch between expectations and reality. "Difficult to get people to invest resources in things like infrastructure rather than shiny apps." Other cultural dimension: treated like a market. Instead of investing, "just build the best moustrap." "skepticism of the value of cooperation." "What we're finding is that that can be unlearned." "We can rediscover that value of cooperative logic... it takes a first mover, a second moved, but once people are collaborating, the logic is hard to beat." "It requires that investment of resources in order to align incentives around that value proposition."
Gone well: "I don't think the world needs more apps, but we have several apps that are built around the standard." "Having tools that everyone needs that they can plug & play has worked." Community built (Phil, Open Data UK, etc).
In interests to do this, but they don't have the capacity to do the things that will save them capacity. Simply making the case is not enough, you need to make the case and then get additional resources to come.
What has gone well: found funding that was allocated to doomed projects and able to demonstrate value. (took many years).
What tools and technologies do you use for this effort?
Tech around standard: started with google docs, now on github with readthedocs documentation, using hypothesis for commenting. Standard itself: vocabulary, logical model, formatting instructions. formatting: csv. Objective: be simple for folks to open up and edit. But domain is more complex than GTFS. e.g., one institution with multiple locations. Then json data package + csvs. Now they have an open API spec. Next chapter of evolution: likely API-first with json / csv as one formatting option. They're not harvesting the data. Hypothesis 1: self publishing (peer-to-peer). Hypothesis 2: centralized hub for community (one-to-many). Average area: 1-2 employees, how to recoup the cost of the data if it should be available for free? Hypothesis 3: many-to-many federation. Many intermediaries. Different intermediaries can share data in a federated network.
Why did you choose this architecture or process. Were others tried, etc (after the "data aggregation/distribution" question)
What are the political and organizational dynamics of collecting this data?
Who were the relevant stakeholders for this project, how were they identified and convened?
Is there anyone else I should speak with to better understand X?