18F / data-federation-project

A project focused on tools and best practices to supported federated data collection efforts
28 stars 9 forks source link
10x-data-federation data-federation ingest-data opendata

Data Federation Project

The U.S. Data Federation project promotes government-wide capacity-building to support distributed data management challenges, data interoperability, and broader data standards activities. The project is an initiative of the GSA Technology Transformation Services (TTS) 10x program, which funds technology-focused ideas from federal employees with an aim to improve the experience all people have with our government. 

Overview

U.S. government policies, initiatives, and public-facing products and services depend on aggregating and harmonizing data from disparate government sources. The goal of the U.S. Data Federation project is to document repeatable processes, develop reusable tooling, and curate resources to support federated data projects. 

We define a federated data project as an effort in which a common type of data is collected or exchanged across complex, disparate organizational boundaries. For example, federal agencies often need to collect data from state and local governments, other federal agencies, and other data providers. These federated data may be used to support policy or budget decisions, operational efficiencies, or published in aggregate form for other data users. 

Federated data efforts are increasingly seen as an engine for transparency, economic growth, and accountability, yet collecting this kind of data remains a challenge. While this type of data management effort is growing increasingly common in our distributed style of government, each new effort is still improvising solutions in terms of processes, tooling, and compliance infrastructure. Many of these federated data efforts face common requirements and common challenges, but lack common resources. 

The U.S. Data Federation project was conceived in 2016 to address this gap. The project set out to identify common challenges and pain points in federated data efforts and address these needs by curating best practices and resources and developing reusable tooling. The best practices and resources were intended to include guides and repeatable processes around data governance, organizational coordination, and standards development in federated environments. The reusable tools were intended to include capabilities around data validation, automated aggregation, and the development and documentation of data specifications.   

Over the course of its first three phases of 10x funding, it began to deliver on this ambition by building and launching ReVal, a Reusable Validation Library, which has been used by the USDA Food & Nutrition Services and other agencies to streamline data collection and validation processes.

During Phase 4, the team took advantage of a unique opportunity to unite government-wide efforts to support open data and federated data efforts. The team has supported Data.gov, OMB, and OGIS stakeholders in developing a vision and delivering increased functionality for resources.data.gov, a legislatively-mandated online repository of policies, tools, case studies and other resources to support data governance, management, and use throughout the federal government.

After conducting research with the stakeholders and audience for resources.data.gov, the team saw an opportunity for a long-term practical manifestation of the Data Federation as the content strategy team underpinning resources.data.gov. The future funding and organization of this work is currently under negotiation.

Project milestones

Phase 1 (Fall 2017)

Team: Phil Ashlock, Anthony Garvan

Phase 2 (Spring 2018)

Team: Phil Ashlock, Catherine Devlin, Anthony Garvan, Chris Goranson, Joe Krzystan

Phase 3 (December 2018-June 2019)

Team: Phil Ashlock, Mike Gintz, Mark Headd, Ethan Heppner, Julia Lindpaintner, Amy Mok

Phase 4 (October 2019-April 2020)

Team: Phil Ashlock, Mike Gintz, Julia Lindpaintner, Amy Mok, Princess Ojiaku, James Tranovich

References and deliverables

Phase 1

Phase 2

Phase 3

Phase 4
Forthcoming

Biweekly updates
Starting in Phase 3, the team began publishing updates on its activities and progress roughly bi-weekly. All past updates can be found here.

Related repositories

There are several repositories that contain code that is a part of this project.

Other repos referenced: