TheDataStation / ver

Data Discovery Tools and Systems
MIT License
6 stars 10 forks source link

Temporal attributes need to be handled separately #11

Open sainyam opened 1 year ago

sainyam commented 1 year ago

The current system implementation considers dates as string, where hyphens are replaced by space to calculate token-based similarity. Similarity-based threshold generates many false positives, e.g. 13/jan/2023 and 13-jan-2020 are considered to join.