Refactor TANE-based algorithms

iliya-b commented 7 months ago

Generalize Tane and PFDTane, add additional tests.

In order to check if the refactoring caused any performance loss, following experiments were performed. The discovery task was run as cli.py --task=afd --algo=tane --error=0.05 --table=... with new and original versions of TANE implementation. Following heavy datasets were utilized: EpicMeds.csv, adult.csv, EpicVitals.csv.

Following list demonstrates measured running time of the old and new algorithms, correspondingly (confidence intervals of 95%, with 10 iterations):

EpicMeds.csv (old) 59.715925465099986 +- 0.1869874511220996
EpicMeds.csv (new) 59.5840122977 +- 0.06763601341304505
adult.csv (old) 24.654166058699996 +- 0.06323832294394492
adult.csv (new) 24.76226707977778 +- 0.09297212157319155
EpicVitals.csv (old) 10.6707755998 +- 0.11612311140862534
EpicVitals.csv (new) 10.7569084586 +- 0.0103879548810794

iliya-b commented 6 months ago

@vs9h I've fixed the architectural issues with Tane and PFDTane algorithms as you suggested in PR #300

iliya-b commented 1 month ago

@vs9h I've fixed the issues with this PR. You mentioned another PR #396 , but that PR is still a draft and it rather introduces a few performance enhances into the algorithm and does not affect the architecture. The current PR blocks some other PRs, that's why I've kept only changes that are related to this PR (refactoring) for this moment. What do you think?

vs9h commented 1 month ago

Also, split commits into at least two (tests in a separate commit)

iliya-b commented 1 month ago

@vs9h I've fixed these issues.

Desbordante / desbordante-core

Refactor TANE-based algorithms #378