CODARcode / MGARD

MGARD: MultiGrid Adaptive Reduction of Data
Apache License 2.0
37 stars 25 forks source link

number too large to be quantized #131

Open JasonRuonanWang opened 3 years ago

JasonRuonanWang commented 3 years ago

I have come across this exception when I was trying to compress a 2D float array. When I set the accuracy to 0.000001, as soon as I increase the array size to larger than 300x300, which is not a big array in the sense of HPC applications, it starts throwing this exception.

I have attached a simple code to re-produce it. It's an ADIOS2 code, but should be fairly easy to translate to a MGARD code.


#include <adios2.h>
int main(int argc, char *argv[])
{
    size_t variable_size = 300;
    std::vector<double> myFloats(variable_size*variable_size);
    for(size_t i=0; i<myFloats.size(); i++)
    {
        myFloats[i]=i;
    }
    adios2::ADIOS adios;
    adios2::IO io = adios.DeclareIO("TestIO");
    adios2::Dims shape({variable_size, variable_size});
    adios2::Dims start({0, 0});
    adios2::Dims count({variable_size, variable_size});
    auto varFloats = io.DefineVariable<double>("myfloats", shape, start, count);
    adios2::Operator mgardOp = adios.DefineOperator("mgardCompressor", adios2::ops::LossyMGARD);
    varFloats.AddOperation(mgardOp, {{adios2::ops::mgard::key::accuracy, "0.000001"}});
    adios2::Engine engine = io.Open("TrainingData", adios2::Mode::Write);
    engine.BeginStep();
    engine.Put<double>(varFloats, myFloats.data());
    engine.EndStep();
    engine.Close();
    return 0;
}
ben-e-whitney commented 3 years ago

I can reproduce this on 360a2fb853adbb1d758704960d4f51e85e4d3f57. Here is more or less what's happening:

  1. The original dataset is transformed into a collection of coefficients.
  2. For each coefficient x, a quantizer bin width q is computed. The bin width depends on a number of factors, among them the error tolerance and the size of the input dataset.
  3. The quantizer finds an integer n such that n * q is as close as possible to x. n is stored as a long int. If n doesn't fit in a long int, the exception you see is thrown.

The easiest way to run into this is to use a very low error tolerance. See #32. MGARD should probably store the coefficients uncompressed (abandoning quantization) rather than throw an exception in this situation.