HPI-Information-Systems / snowman

Welcome to Snowman App – a Data Matching Benchmark Platform.
https://hpi-information-systems.github.io/snowman/
MIT License
37 stars 2 forks source link
benchmark data-matching data-stewards duplicate-detection entity-resolution kpis matching snowman

logo

Snowman

General Documentation Release

Comparing data matching algorithms is still an unsolved topic in both industry and research. With snowman, developers and researchers are able to compare the performance of different data matching solutions or improve new algorithms. Besides traditional metrics, the tool also considers economic aspects like Soft KPIs.

Benchmark Dashboard

This tool is developed as part of a bachelor's project in collaboration with SAP SE.

Research Project

This tool has been published as part of the the paper "Frost: Benchmarking and Exploring Data Matching Results" (2022) at VLDB. More details on reproducing results shown within the paper can found here.

Current state

In Q1 and Q2 of 2021, we reached the following milestones:

[x] Milestone 1: Ability to add/delete datasets, experiments and matching solutions; binary comparison and basic behavior analysis; executable on all major platform
[x] Milestone 2: Compare more than two experiments and create new experiments based on results; survey Soft KPIs, allow comparison based on KPIs
[x] Milestone 3: Allow individual thresholds for experiments, extend Soft KPIs further and allow advanced evaluation of matching solutions

The precise progress is tracked through Github issues and project boards. Please get in touch in case you want a special feature included :)

After reaching milestone 3, we plan to continue to work on further features which will broaden the tools abilities and features.

Showcase

To show off some key features that Snowman offers, we created a small introductory video:

Snowman Showcase

Contributing

Contribution guidelines will follow soon. Until then, feel free to open an issue to report a bug or request a feature.
In case you want to contribute code, please first open an associated issue and afterwards a pull request containing the proposed solution.

Development

See our development guide for more information on how to get started.

Documentation

Please see our documentation for further information: Snowman Docs

Licenses

Copyright 2021 Hasso Plattner Institute. Licensed under the MIT license.

A complete list of all dependencies and their individual licenses can be found within our documentation.