For, e.g., the DuplicateInstance metric we are currently keeping a complete record of all instances found so far in memory. For huge datasets we might have to do some fuzzy approximation, similar to LODStats. I.e. that we throw away part of the full details we have in memory, and replace them by fuzzier approximations that consume less memory.
For, e.g., the
DuplicateInstance
metric we are currently keeping a complete record of all instances found so far in memory. For huge datasets we might have to do some fuzzy approximation, similar to LODStats. I.e. that we throw away part of the full details we have in memory, and replace them by fuzzier approximations that consume less memory.