diachron / quality

Dataset Quality Assessment (part of WP5 of the Diachron EU FP7 project)
MIT License
8 stars 4 forks source link

Save memory by fuzzy approximation of statistics #42

Open clange opened 10 years ago

clange commented 10 years ago

For, e.g., the DuplicateInstance metric we are currently keeping a complete record of all instances found so far in memory. For huge datasets we might have to do some fuzzy approximation, similar to LODStats. I.e. that we throw away part of the full details we have in memory, and replace them by fuzzier approximations that consume less memory.