eliaruehle / Data-Mining

Data Mining Project for KP/FP/MP DB Anwendungsentwicklung: Working Title: "Big Data Group Project" at Tu Dresden SS 2023
1 stars 0 forks source link

Decide on document parameters that we can use to describe a dataset #23

Closed wpertsch closed 1 year ago

wpertsch commented 1 year ago

How many parameters do we need? 20-30! We can use https://github.com/Evenaar/active-learning-dataset-benchmark to find those parameters --> will be ready in a few days.

jembie commented 1 year ago

These are all the features which are currently considered for implementation. They are orientated from the master thesis' work and we are in contact / collaboration with them. See documents/data_analysis.md for details about the features that are mentioned here.

simple metafeatures

statistical metafeatures

Information-theoretic metafeatures

Concept and complexity-based metafeatures

Establishing Similarity Between Datasets

wpertsch commented 1 year ago

Most of the metrics are relying on the class, but also on the label. We can't use those and have to decide what we want to keep/what we can recycle to our advantage.