Crowd4SDG / crowdnalysis

Library to help analyze crowdsourcing results
GNU Affero General Public License v3.0
4 stars 1 forks source link
annotation-aggregation annotator-model citizen-science crowdsourcing prospective-analysis

IIIA-CSIC GitHub -Actions pyversion PyPI codecov DOI

crowdnalysis

Crowdsourcing Citizen Science projects usually require citizens to classify items (images, pdfs, songs, etc.) into one of a finite set of categories. Once an image is annotated by contributing citizens, we need to aggregate these annotations to obtain a consensus classification. Usually, the consensus for an item is achieved by selecting the most voted category for the item. crowdnalysis allows computing consensus using more advanced techniques beyond the standard majority voting. In particular, it provides consensus methods that model quality for each of the citizen scientists involved in the project. This more advanced consensus results in higher quality information for the Crowdsourcing Citizen Science project, an essential requirement as citizens are increasingly willing and able to contribute to science.

Implemented consensus algorithms

In addition to the pure Python implementations above, the following models are implemented in the probabilistic programming language Stan and used via the CmdStanPy interface:

~ Eta models impose that the probability of a reported label is higher for the real class in the error-rate (a.k.a. confusion) matrix.

Features

Quick start

crowdnalysis is distributed via PyPI: https://pypi.org/project/crowdnalysis/

You can easily install it just like any other PyPI package:

pip install crowdnalysis

CmdStanPy will be installed automatically as a dependency. However, this package requires the installation of the CmdStan command-line interface too. This can be done via executing the install_cmdstan utility that comes with CmdStanPy. See related docs for more information.

install_cmdstan

Use the package in code:

>>> import crowdnalysis

Check available consensus models:

>>> crowdnalysis.factory.Factory.list_registered_algorithms()

See the TUTORIAL notebook for the usage of main features.

Unit tests

We use pytest as the testing framework. Tests can be run—at the cloned repo directory—by:

pytest

If you want to get the logs of the execution, run:

pytest --log-cli-level 0

Logging

We use the standard logging library.

Deployment to PyPI

Note for contributors Follow these simple steps to have a new release automatically deployed to [PyPI](https://pypi.org/project/crowdnalysis/) by the [CD workflow](https://github.com/Crowd4SDG/crowdnalysis/blob/master/.github/workflows/cd.yml). The example is given for the version `v1.0.2`: 1. Update the version in `src/crowdnalysis/_version.py`: ```python __version__ = "1.0.2" # Note no "v" prefix here. ``` 2. `git push` the changes to `origin` and make sure the remote `master` branch is up-to-date; 3. Create a new `tag` preferably with (multiline) annotation: ```bash git tag -a v1.0.2 -m " . Upgrade to CmdStanPy v1.0.1" ``` 4. Push the tag to `origin`: ```bash git push origin v1.0.2 ``` And shortly, the new version will be available on PyPI.

License

This project is licensed under the GNU Affero General Public License v3.0 - see the LICENSE file for details.

Citation

If you find our software useful for your research, kindly consider citing it using the following biblatex entry with the DOI attached to all versions:

@software{crowdnalysis2022,
  author       = {Cerquides, Jesus and M{\"{u}}l{\^{a}}yim, Mehmet O{\u{g}}uz},
  title        = {crowdnalysis: A software library to help analyze crowdsourcing results},
  month        = jan,
  year         = 2022,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.5898579},
  url          = {https://doi.org/10.5281/zenodo.5898579}
}

Acknowledgements

crowdnalysis is being developed within the Crowd4SDG and Humane-AI-net projects funded by the European Union’s Horizon 2020 research and innovation programme under grant agreements No. 872944 and No. 952026.

Reference

For the details of the conceptual and mathematical model of crowdnalysis, see:

[1] Cerquides, J.; Mülâyim, M.O.; Hernández-González, J.; Ravi Shankar, A.; Fernandez-Marquez, J.L. A Conceptual Probabilistic Framework for Annotation Aggregation of Citizen Science Data. Mathematics 2021, 9, 875. https://doi.org/10.3390/math9080875