apify-projects / product-mapping-engine

0 stars 1 forks source link

Handle the possibility of missing data and shift all the similarities to <-1,1> range #7

Closed Equidem closed 2 years ago

Equidem commented 2 years ago

Since it is possible that in some datasets, some data columns will be missing both altogether or just be missing in some rows, we need to make sure the engine can deal with such cases without failing. To do so properly, we also need to make sure there is a difference between the available texts (for instance) not matching and the texts not being there at all. To do this, lets shift every single probability we are calculating to the <-1,1> range, with values close to -1 signifying total mismatch, values around 0 signifying either ambiguity or the data not being present and values close to 1 meaning total match.

kackamac commented 2 years ago

All similarities were shifted to range <-1,1> and possibility of missing data was also handled by setting similarity to 0.