MPEGGroup / FileFormat

MPEG file format discussions
23 stars 0 forks source link

ImageGrid for very large image sizes #105

Open farindk opened 2 months ago

farindk commented 2 months ago

Images with item_type='grid' are currently limited to 256x256 tiles because the rows_minus_one and columns_minus_one are stored as 8 bit integers. Assuming a maximum sensible tile size of 1024x1024 pixels, this means that the largest image resolution is 262144x262144 pixels. This may be too limiting for very large images as they arise, for example, in geospatial imaging.

As a remedy I suggest to define a version=1 in the ImageGrid (ISO 23008-17:2017, Section 6.6.2.3.2) with the possibility to use larger integers to store rows_minus_one and columns_minus_one. One possible definition could be

class ImageGrid {
  unsigned int(8) version = 1;
  unsigned int(8) flags;
  FieldLength = ((flags & 1) + 1) * 16;
  TilesFieldLength = ((flags & 2) ? 32 : 8);         <<<
  unsigned int(TilesFieldLength) rows_minus_one;     <<<
  unsigned int(TilesFieldLength) columns_minus_one;  <<<
  unsigned int(FieldLength) output_width;
  unsigned int(FieldLength) output_height;
}

Note the new TilesFieldLength that switches the integer size between the old 8 bit and a new 32 bit size. It might also be considered to provide the possibility for a 64bit FieldLength depending on a flag.

PileofBones86 commented 2 months ago

I like the approach. On a quick note, there is a typo in the ISO number for HEIF above. The current document is ISO 23008-12:2022.

bradh commented 2 months ago

I support the concept. A slightly different syntax option:

aligned(8) class ImageGrid {
    unsigned int(8) version;
    unsigned int(8) flags;
    if (version == 0) {
        unsigned int FieldLength = ((flags & 0x01) + 1) * 16; // this is a temporary, non-parsable variable
        unsigned int(8) rows_minus_one;
        unsigned int(8) columns_minus_one;
        unsigned int(FieldLength) output_width;
        unsigned int(FieldLength) output_height;
    } else if (version == 1) {
        unsigned int FieldLength = ((flags & 0x03) + 1) * 16; // this is a temporary, non-parsable variable
        unsigned int(32) rows_minus_one;
        unsigned int(32) columns_minus_one;
        unsigned int(FieldLength) output_width;
        unsigned int(FieldLength) output_height;
    }
}

Where FieldLength is 16 or 32 for the version == 0 case, and 16, 32 or 64 for the version == 1 case. flags equal to 0x02 would not be valid.

The concept here is that you'd use version 0 where the grid is simple, and version 1 when the grid is large.

leo-barnes commented 2 months ago

Hmmm. Do we really need this though? You can easily create a grid of grids if you want larger images.

farindk commented 2 months ago

Hmmm. Do we really need this though? You can easily create a grid of grids if you want larger images.

Simply increasing the integer size would be much easier than handling a multi-level hierarchy. Especially as those large images will only be decoded partially in a ROI, which is a more complex decoding logic already. And if we want to dynamically grow an image by adding more tiles, this may lead to a whole file reorganization when we reach the limit and need another level of indirection.

Finally, storing 2 or 6 bytes more is also less overhead than introducing a whole layer of dummy items.

Bonus reason: a grid of grids might conflict with pymd as that requires that the tile size of each level is the same. So in a multi-level grid, does that mean the tile size of the top level grid or that of the bottom level grid?

farindk commented 2 months ago

I just noticed that 256x256 grid images are not possible even though ImageGrid supports 256 tile columns and rows.

The problem is that the iref box reference_count only supports up to 65535 references, which is one below the needed number. Thus, the maximum grid size is currently 255x256.

For 256x256 grids, we would need to increase the reference_count to 32bit as described here: https://github.com/MPEGGroup/FileFormat/issues/106.