What is the data federation effort all about? What am I looking to get out of it?
This is a collaborative research project with GSA's Office of Products & Platforms and 18F. The goal is to build a toolkit / playbook for undertaking intra-governmental data collection / aggregation projects, such as data.gov, code.gov, and NIEM, where data is collected from entities over which you do not have direct authority. We call these federated data efforts. Or goal is to find out what works, what doesn't, and what tools are appropriate for what circumstances, in order to accelerate similar efforts in the future.
Notes & Report will be public
Any questions before we get started?
What experience do you have that might be relevant to this effort? (e.g., work with open data standards, participation in gov data collection across organizational boundaries, etc.)
outside gov: sunlight / Open Contracting Partnership / founded open north. Spectrum of how much effort parties need to do:
scraping and normalizing: better if data doesn't change too frequently, fairly simple, simple meaning. e.g., collecting officials contact info, simple. complex: scraping contracting information.
Slightly more effort: asking them to send you the data. e.g., send us your shape files. they need to (1) find the data, (2) send it to you, (3) give permission to republish.
ask to publish according to a specific format ("getting into realm of standard"), publish on website, send url. In all cases already had open data policies. In terms of encouraging adoption
more formal: standards like open contracting data standard / elected officials data standard / aid transparency. Is it just technical or also policy? what is being measured, how to assess, another simple project is openaddresses.io, but in cases like aid and contracting, "you could get very deep". How to accommodate for differences across jurisdictions / agencies? At that point you want a permanent product team, follow a much more operational approach. For previous levels.
even more formal: legal requirement to publish + centralized operations, vs more decentralized effort.
Why end up in different parts of spectrum?
ambition / comprehensiveness
level of data quality you're targeting
complexity of domain
interacts with policy or purely technical?
centralized approach better for critical top-down initiatives (e.g., DATA Act)
What efforts are you aware of that fit this category?
Do you have contacts in those efforts who we could reach out to?
What do you think are the primary challenges of these types of efforts?
People who receive data get clear benefit, but data owners do not have incentives to change- they already own and benefit from the data, another layer of reporting is only a hassle. Frame it from users perspective / representing interests of a whole group of stakeholders, people more interested. Showed that they weren't going to collect data and do nothing with it: already have an app for it. Before doing broader outreach, identified prominent first adopters. sequence adopters to get people close to them geographically & in terms of city size. For contracting, easier to make argument for shared benefit. Also tracking performance of service provider is helpful. A lot of cases people may benefit, but you still need to make the case well.
in case of contracting, more countries might want to include budgets with that. might be multiple additional stages , committed vs executed. Could model it, but then what if someone else does it differently?
Aggregating data sets- agree on common core, people can include more if they want to. In case of data catalog, that works, but in contracting case, new requirements pose a bigger challenge because they have high standard for interoperability.
Recent challenge- quality or specificity of data dictionary. Field that includes cost value - is that before or after taxes? also interpretability- certain domain specific words are interpreted in different ways (e.g., "tender" translated several ways), high risk in a domain specific field with a lot of jargon.
social challenges- getting people to correctly implement it. Might see standard but don't adhere to it exactly, use it as a guide vs rule. Often happens with bigger orgs, "we're big, we know better". People "just do it there way", challenges to make sure people are implementing it the way it was agreed on. Having more consensus driven approach would be appropriate.
in terms of adoption --> people might see value, but need to justify cost of adoption, might be more on social side than technical side.
also, people seeing it as a compliance exercise, then might not be invested in good data quality. e.g., some department looking like they are procuring a lot of soybeans. Why? It was the highest on the form for a required field.
with complicated standards, might get buy-in at high level, but with implementers it might not get the same framing.
tooling - useful?
data completeness
validators
"we need better tools like that"
a lot of times groups pursue a specific solution because general solutions dont exist
Introductory comments
What experience do you have that might be relevant to this effort? (e.g., work with open data standards, participation in gov data collection across organizational boundaries, etc.)
outside gov: sunlight / Open Contracting Partnership / founded open north. Spectrum of how much effort parties need to do:
scraping and normalizing: better if data doesn't change too frequently, fairly simple, simple meaning. e.g., collecting officials contact info, simple. complex: scraping contracting information.
Slightly more effort: asking them to send you the data. e.g., send us your shape files. they need to (1) find the data, (2) send it to you, (3) give permission to republish.
ask to publish according to a specific format ("getting into realm of standard"), publish on website, send url. In all cases already had open data policies. In terms of encouraging adoption
more formal: standards like open contracting data standard / elected officials data standard / aid transparency. Is it just technical or also policy? what is being measured, how to assess, another simple project is openaddresses.io, but in cases like aid and contracting, "you could get very deep". How to accommodate for differences across jurisdictions / agencies? At that point you want a permanent product team, follow a much more operational approach. For previous levels.
even more formal: legal requirement to publish + centralized operations, vs more decentralized effort.
Why end up in different parts of spectrum?
What efforts are you aware of that fit this category?
Do you have contacts in those efforts who we could reach out to?
What do you think are the primary challenges of these types of efforts?
People who receive data get clear benefit, but data owners do not have incentives to change- they already own and benefit from the data, another layer of reporting is only a hassle. Frame it from users perspective / representing interests of a whole group of stakeholders, people more interested. Showed that they weren't going to collect data and do nothing with it: already have an app for it. Before doing broader outreach, identified prominent first adopters. sequence adopters to get people close to them geographically & in terms of city size. For contracting, easier to make argument for shared benefit. Also tracking performance of service provider is helpful. A lot of cases people may benefit, but you still need to make the case well.
in case of contracting, more countries might want to include budgets with that. might be multiple additional stages , committed vs executed. Could model it, but then what if someone else does it differently?
Aggregating data sets- agree on common core, people can include more if they want to. In case of data catalog, that works, but in contracting case, new requirements pose a bigger challenge because they have high standard for interoperability.
Recent challenge- quality or specificity of data dictionary. Field that includes cost value - is that before or after taxes? also interpretability- certain domain specific words are interpreted in different ways (e.g., "tender" translated several ways), high risk in a domain specific field with a lot of jargon.
social challenges- getting people to correctly implement it. Might see standard but don't adhere to it exactly, use it as a guide vs rule. Often happens with bigger orgs, "we're big, we know better". People "just do it there way", challenges to make sure people are implementing it the way it was agreed on. Having more consensus driven approach would be appropriate.
in terms of adoption --> people might see value, but need to justify cost of adoption, might be more on social side than technical side.
also, people seeing it as a compliance exercise, then might not be invested in good data quality. e.g., some department looking like they are procuring a lot of soybeans. Why? It was the highest on the form for a required field.
with complicated standards, might get buy-in at high level, but with implementers it might not get the same framing.
tooling - useful?