Desbordante / desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
GNU Affero General Public License v3.0
382 stars 70 forks source link

Implement EulerFD approximate discovering FD algorithm #414

Open mitya-y opened 5 months ago

mitya-y commented 5 months ago

Implement approximate FD (Functional Dependencies) discovery algorithm based on the article "EulerFD: An Efficient Double-Cycle Approximation of Functional Dependencies" by Qiongqiong Lin, Yunfan Gu, Jingyan Sai. Add unit tests for the approximate FD algorithms, including a custom random option to ensure consistent test results across different systems. Utilize a custom random function to calculate answers of the current EulerFD version for each test dataset with a selected seed. Integrate the EulerFD algorithm into Python bindings and the Python console.

For more information on EulerFD, refer to the presentation: EulerFD Overview. Detailed development information and test results can be found here: Development Details.