TheClimateCorporation / mandoline

A distributed, versioned, multi-dimensional array database
Other
105 stars 17 forks source link

Decompression filter chain creates a ByteBuffer that eventually leads to an incorrectly read chunk when the data type is byte #17

Closed stevemkim closed 9 years ago

stevemkim commented 9 years ago

This is a subtle bug that occurs when the data type is byte.

The implementation of ucar.ma2.Array has this lovely snippet:

/**
   * Create an Array from a ByteBuffer
   * @param dtype type of data
   * @param shape shape of data; if null, then use int[]{bb.limit()}
   * @param bb data is in here
   * @return equivilent Array
   */
  public static Array factory(DataType dtype, int[] shape, ByteBuffer bb) {
    int size;
    Array result;

    switch (dtype) {
      case ENUM1:
      case BYTE:
        if (shape == null) shape =  new int[]{bb.limit()};
        return factory(dtype, shape, bb.array());

      case CHAR:
        size = bb.limit();
        if (shape == null) shape =  new int[]{size};
        result = factory(dtype, shape);
        for (int i = 0; i < size; i++)
          result.setChar(i, (char) bb.get(i));
        return result;

      case ENUM2:
      case SHORT:
        ShortBuffer sb = bb.asShortBuffer();
        size = sb.limit();
        if (shape == null) shape =  new int[]{size};
        result = factory(dtype, shape);
        for (int i = 0; i < size; i++)
          result.setShort(i, sb.get(i));
        return result;

Notice the pecular implementation of the BYTE case. Whereas the implementations for other data types use the ByteBuffer as a byte buffer and call its get method, the implementation for byte calls the .array method to directly get the underlying storage of the ByteBuffer. Disregarding the obvious problem that not all ByteBuffer implementations support array, it's not even correct: a byte buffer can be an offset view of a segment of byte[], such that the underlying array is larger than the capacity of the byte buffer.

This leads to a nasty bug when a filter chain takes a ByteBuffer and calls slice on it to return a view that shares the same underlying storage. Array/factory peeks at the underlying storage and sees bytes that are not part of the sliced byte buffer.

stevemkim commented 9 years ago

Pull request https://github.com/TheClimateCorporation/mandoline/pull/18

stevemkim commented 9 years ago

Fixed by pull request #18