Closed github-actions[bot] closed 2 months ago
Crowd sourced data has drastically changed the data landscape in many beneficial ways; crowd sourced data provides real time, community driven data that ordinary people can contribute to (which is not the case with authoritative data sources). While this democratization of data and diversified types of data available has led to richer multifaceted insights and shifts across industry types, this comes with drawbacks.
As a data scientist it is important to account for the potential drawbacks when using crowd sourced data. Data quality and accuracy, data privacy, potential lack of sustainable data, legal and ethical issues, data biases, and data interpretation are all things to consider. There are possible ways to try to account for these considerations such as: assessing and mitigating data biases, executing quality assessments, and having stringent agreements with third parties/ clear explicit consent with volunteers. It is essential to know your source to properly account for the corresponding drawbacks.
Crowd sourced data, like ‘open science’, is aimed to be open to everyone to contribute to or access regardless of expertise, demographics, etc. Crowd sourced data can add potential new insights to open science through the expanded data sources; however, there would likely be difficulty in being able to reproduce or replicate crowdsourced data which would complicate open science efforts of replication or reproducibility.
Although the challenges of crowd sourced data create complexity to using it, crowd sourced data’s potential should not be underscored.
Check out this weeks reading discussion https://github.com/earthlab-education/Earth-Analytics-AY24/discussions/41