MERLCenter / MERL-Center-public

Website and public, open source content of the MERL Center. See website at https://merlcenter.org
MIT License
30 stars 22 forks source link

Create 2021-09-07-Pros-cons-open-data-abr.md #51

Closed scoker-me closed 3 years ago

scoker-me commented 3 years ago

layout: blog-post title: insert authors:

images should be in the /assets/img/posts/ folder

featuredImage: https://user-images.githubusercontent.com/55405623/132376892-0271ee8c-8fff-4f52-8cda-9b7e27866dd2.png

outgoing: false outgoingUrl:


PROS AND CONS OF OPEN DATA

Melissa Edmiston, Stephanie Coker and Stephanie Jamila and Thembelihle Tshabalala

Open data is an increasingly important topic in MERL. Many MERL practitioners advocate for open data given the benefits of sharing data that others can use to analyze, reanalyze and draw new and beneficial conclusions. However, making data open does not come without risks and could result in unintended consequences. The following guide outlines some of the pros and cons of open data and things to consider when making your data open. For summary of the key points highlighted in the article, see Table 1.

Table 1:Summary of key points

Pros Cons/Risk Factors
Accessibility of data: increased community engagement, improved efficiency and reduced cost, encourages progress and innovation Incorrect use of data and the problem of missing information
Increased transparency Privacy and consent
Reduced corruption Mosaic effect
Interpretation of data Costs and sustainability of open data projects

PROS

Accessibility of data

One of the overarching benefits of open data is accessibility within a thematic area or sector. Data collection and cleaning can be expensive, and many projects or organizations are limited in their capacity. Making data open increases the number of datasets available for others to analyze and draw conclusions. This can result in:

Increased community engagement:

Open data has the potential to build a community around the data; bringing people together who are working on similar issues who can exchange ideas, findings and discuss challenges. This can encourage data collaboration rather than competitiveness. Both users and creators of open data have formed communities around the dataset or topic of interest, and these two groups are not mutually exclusive.

Improved efficiencies and reduced costs:

Access to open data increases the rate and ease of discovery, thus enabling researchers to have more resources to fortify their work across disciplines. Open data can be used to enhance data that is already at the disposal of organisations and companies of all sizes. Small companies can particularly benefit from open data that is in an industry in which they would like to expand. Open data can also reduce the chance of duplication in data collection efforts, thus saving time and money for organizations.

Progress and innovation:

Because open data is offered without a monetary barrier, more people have access and can use new methods of analysis, which can further the field of study or contribute to programmatic advancements, encouraging innovation and progress.

Increased transparency

Open data can also lead to increased transparency for users around topics or issues that the data addresses. Since open data is freely and publicly available, it lowers the barrier for the general public (and specific stakeholders) to understand the topic or issue the data addresses. Having the data at hand also empowers stakeholders to act on the data, advocating for themselves and their community.

Reduced corruption

Open data is an important element in the fight against corruption. It strengthens public integrity and accountability between policymakers, government, companies, and citizens through the use of evidence that is open data of either maladministration, governance gaps or blatant corruption. While a significant amount of important and useful government data remains inaccessible, there are examples of governments taking stances to support open data initiatives.

Interpretation of data

Open data allows additional individuals to analyze the data and interpret and validate the findings in numerous ways. A Mckinsey report on the benefits of open data stated that open data has three value levers namely: decision making, innovation and accountability. It also highlighted the fact that open data value levers benefit a wide range of stakeholders and that a single open-data initiative has the ability to empower governments, private sector as well as NGOs but derive different value depending on the use and the interpretation of the data.

CONS/RISK FACTORS

Incorrect use of data and missing data

When using open data, proper consideration of data collection methods and metadata is paramount for accuracy. When these are misunderstood, erroneous conclusions may be drawn from data.

Privacy and Consent

Data, whether open or proprietary, is regulated by laws that aim to guard the rights’ of individuals and protect against malicious use of data. The passage of the EU’s General Data Protection Regulation (GDPR) marked the first, enforceable legislation on data privacy and has been touted as the most significant regulatory development in information policy, influencing development of data privacy policy in other territories.

Mosaic Effect

The mosaic effect is a term used when discussing confidentiality. It is derived from the mosaic theory of intelligence gathering, in which disparate pieces of information become significant when combined with other types of information. Applied to data in the MERL sector, this occurs when multiple datasets are linked to reveal new information. Even if data is appropriately anonymized and efforts are made to remove personal identifiers, if there are multiple datasets containing similar or complementary information, it’s possible to determine identity based on the various data combined across the datasets such as gender, location, educational status etc. Resources are now available to help MERL practications think about how their data may contain certain linkages or risks that may require additional levels of security or anonymization. Figure 1 displays an example of how identity theft can occur when the mosaic effect takes place.

Figure 1: Mosaic Effect Example of Identity Theft https://user-images.githubusercontent.com/55405623/132376892-0271ee8c-8fff-4f52-8cda-9b7e27866dd2.png

Costs and sustainability of open data projects

Open data has been described as a public good. While the data is offered for free, there is usually a huge cost for the organization implementing the open data initiative. According to recent literature, beginning costs of open data initiatives vary from €20,000 to €100,000 per organisation. Start up costs are also followed by adaptation costs, infrastructural costs, and maintenance/operational costs. Additionally, from an NGO/non-profit perspective, funding these open data projects is also reliant on being able to pitch the usefulness of open data to funders. There is a risk of funders’ priorities changing, which can harm the long-term sustainability of the open data project. Another risk is that if funders’ and users’ agendas don’t align, the open data project may end up not serving the needs of the people who actually use the data. All of these sustainability factors affect decision-making around open data initiatives and often end up proving to be insurmountable.

malakumar85 commented 3 years ago

Thanks, @scoker-me ! Please add the front matter to the post. Use this as an example: https://github.com/MERLTech/MERL-Center-public/blob/main/sample-frontmatter.md

malakumar85 commented 3 years ago

Merging this PR, but we might need to make changes later