Reduce the size of DEMs in ASP

oleg-alexandrov commented 11 years ago

This is related to issue #61, but aimed primarily at reducing the size of DEMs which are final products rather than intermediate data. Some ideas floated around:

Round DEM values to cm (or user-specified value) and convert to int32 if possible.
Use predictor flags: predictor=2 for the integer formats and predictor=3 for the floating point

oleg-alexandrov commented 11 years ago

ASP now takes the input point cloud, shifts it by a certain point to bring the points closer to origin, and saves the points as float instead of double. The shift is recorded in the GeoTiff header and undone upon reading (gdalinfo can be used to display the shift).

The resulting DEMs differ in height by at most 1mm (tested over the entire ASP test suite) which is quite acceptable. The new point clouds are 40%-50% smaller.

The trickiest part was to decide how to pick the point to shift by (the new origin) without computing the entire point cloud in memory. The solution I chose is to take a small tile in the center of the point cloud, compute all valid triangulation points in that tile, and find the median of those xyz points. If this fails, the tile is increased to grab more points. If all fails (no valid points in cloud) it falls back to not using a shift. Potentially one can use a shift per tile, but that does not seem to be necessary.

Next thing to do here will be dealing with predictor flags/encoding.

oleg-alexandrov commented 11 years ago

I implemented passing to GDAL predictor=3 for float/double, and predictor = 2 for int, per http://www.gdal.org/frmt_gtiff.html. Here are the results before and after such size reduction:

D.tif 2.1M 354K DEM.tif 12M 7.4M D_sub.tif 758K 170K F.tif 21M 18M GoodPixelMap.tif 90K 91K L.tif 11M 11M L_sub.tif 4.1M 3.7M PC.tif 40M 28M R.tif 11M 9.7M RD.tif 23M 20M R_sub.tif 4.0M 3.6M lMask.tif 37K 45K lMask_sub.tif 18K 19K rMask.tif 37K 45K rMask_sub.tif 18K 19K

We see a non-trivial size reduction, say for PC by 30%, DEM by 38%. Note that for some of these the size actually increase, namely GoodPixelMap.tif, and _Mask_tif. Those are images with pixel size being 1 byte. For that reason, I turned off the predictor for such (u)int8 images, keeping it on however for (u)int(16,32,64) and float/double.

Overall the size of a run directory goes down by up to 40%.

Note that it is not all perfect however, in some cases, L.tif and R.tif actually compress 2x worse with predictor 3, while in other cases, they compress, but little. Overall however other images dominate, so even in such a case one is better off with the new approach.

Example: Before predictor implementation:

113M run/run-PC.tif 65M run/run-RD.tif 7.0M run/run-DEM.tif 21M run/run-L.tif etc Total size: 353M

After predictor implementation 86M run/run-PC.tif 58M run/run-RD.tif 5.8M run/run-DEM.tif 44M run/run-L.tif etc Total size: 313M

SmithB commented 11 years ago

Oleg,

The performance of the predictor=3 compression can be improved drastically if the data values are rounded to a sensible precision. For example, rounding DEM elevation values to 1 mm can buy a factor of 3 or so in the output file size, depending on the roughness of the DEM.

It seems to me that the RD.tif and the PC files would be good candidates for this treatment.

--Ben

On Wed, Sep 11, 2013 at 3:23 PM, Oleg Alexandrov notifications@github.comwrote:

I implemented passing to GDAL predictor=3 for float/double, and predictor = 2 for int, per http://www.gdal.org/frmt_gtiff.html. Here are the results before and after such size reduction:

D.tif 2.1M 354K DEM.tif 12M 7.4M D_sub.tif 758K 170K F.tif 21M 18M GoodPixelMap.tif 90K 91K L.tif 11M 11M L_sub.tif 4.1M 3.7M PC.tif 40M 28M R.tif 11M 9.7M RD.tif 23M 20M R_sub.tif 4.0M 3.6M lMask.tif 37K 45K lMask_sub.tif 18K 19K rMask.tif 37K 45K rMask_sub.tif 18K 19K

We see a non-trivial size reduction, say for PC by 30%, DEM by 38%. Note that for some of these the size actually increase, namely GoodPixelMap.tif, and _Mask_tif. Those are images with pixel size being 1 byte. For that reason, I turned off the predictor for such (u)int8 images, keeping it on however for (u)int(16,32,64) and float/double.

Overall the size of a run directory goes down by up to 40%.

Note that it is not all perfect however, in some cases, L.tif and R.tif actually compress 2x worse with predictor 3, while in other cases, they compress, but little. Overall however other images dominate, so even in such a case one is better off with the new approach.

Example: Before predictor implementation:

113M run/run-PC.tif 65M run/run-RD.tif 7.0M run/run-DEM.tif 21M run/run-L.tif etc Total size: 353M

After predictor implementation 86M run/run-PC.tif 58M run/run-RD.tif 5.8M run/run-DEM.tif 44M run/run-L.tif etc Total size: 313M

— Reply to this email directly or view it on GitHubhttps://github.com/NeoGeographyToolkit/StereoPipeline/issues/63#issuecomment-24281376 .

Ben Smith University of Washington Applied Physics Lab Polar Science Center 1013 NE 40th Street Box 355640 Seattle, WA 98105 206 616 9176

dshean commented 11 years ago

Sounds good Oleg. The PC and DEM files were always the space hogs. DRG and DEMError should also display similar compression ratios.

I agree with Ben, but we need to be careful with hardcoding precision - 1 mm doesn't matter for orbital imagery, but could matter for close-range photogrammetry applications. Not sure if that is still a supported ASP use case.

On Sep 11, 2013, at 3:35 PM, SmithB notifications@github.com wrote:

Oleg,

The performance of the predictor=3 compression can be improved drastically if the data values are rounded to a sensible precision. For example, rounding DEM elevation values to 1 mm can buy a factor of 3 or so in the output file size, depending on the roughness of the DEM.

It seems to me that the RD.tif and the PC files would be good candidates for this treatment.

--Ben

On Wed, Sep 11, 2013 at 3:23 PM, Oleg Alexandrov notifications@github.comwrote:

I implemented passing to GDAL predictor=3 for float/double, and predictor = 2 for int, per http://www.gdal.org/frmt_gtiff.html. Here are the results before and after such size reduction:

D.tif 2.1M 354K DEM.tif 12M 7.4M D_sub.tif 758K 170K F.tif 21M 18M GoodPixelMap.tif 90K 91K L.tif 11M 11M L_sub.tif 4.1M 3.7M PC.tif 40M 28M R.tif 11M 9.7M RD.tif 23M 20M R_sub.tif 4.0M 3.6M lMask.tif 37K 45K lMask_sub.tif 18K 19K rMask.tif 37K 45K rMask_sub.tif 18K 19K

We see a non-trivial size reduction, say for PC by 30%, DEM by 38%. Note that for some of these the size actually increase, namely GoodPixelMap.tif, and _Mask_tif. Those are images with pixel size being 1 byte. For that reason, I turned off the predictor for such (u)int8 images, keeping it on however for (u)int(16,32,64) and float/double.

Overall the size of a run directory goes down by up to 40%.

Note that it is not all perfect however, in some cases, L.tif and R.tif actually compress 2x worse with predictor 3, while in other cases, they compress, but little. Overall however other images dominate, so even in such a case one is better off with the new approach.

Example: Before predictor implementation:

113M run/run-PC.tif 65M run/run-RD.tif 7.0M run/run-DEM.tif 21M run/run-L.tif etc Total size: 353M

After predictor implementation 86M run/run-PC.tif 58M run/run-RD.tif 5.8M run/run-DEM.tif 44M run/run-L.tif etc Total size: 313M

— Reply to this email directly or view it on GitHubhttps://github.com/NeoGeographyToolkit/StereoPipeline/issues/63#issuecomment-24281376 .

Ben Smith University of Washington Applied Physics Lab Polar Science Center 1013 NE 40th Street Box 355640 Seattle, WA 98105 206 616 9176 — Reply to this email directly or view it on GitHub.

oleg-alexandrov commented 11 years ago

Regarding the issue I mentioned earlier, well, the simplest thing to do is just not to use any predictor for L.tif, R.tif, L_sub.tif, etc., as it is not helping and worse. I put that fix in. So now the size of run directory consistently goes down, by up to 40% in some cases.

Ben, thank you for your suggestion. This does work indeed, but instead of 1 mm I had to use 1/1024 mm, almost same value, presumably the resulting numbers have fewer digits in binary and that helps with encoding.

With Ben's suggestion, the size of PC further goes down, by about 20% usually, although sometimes not at all, and in some rare cases by 41%. The size of DEM goes down from anywhere between 0% and 60%. For example, for some Mars data, where the terrain is sometimes as much as 10 km below datum (10^4 meters), there is no size reduction after applying this trick, as there are already few significant digits.

As result of applying these approximations, the PC changes by at most 1 mm, as expected. For DEM the story is somewhat different. In vast majority of cases the statistics change by less than 3mm, in a few cases by up to 3 cm, and there is one case where statistics change by 14 cm. This is probably related to PC to DEM rendering artifacts. For one such testcase I ran geodiff and looked at the histogram and also viewed the diff. There are scattered artifacts in it, with most diffs being small.

oleg-alexandrov commented 11 years ago

David, these are good points. I added the precision (rounding error) as an option to stereo_tri and point2dem and had this documented. The error image uses it as well.

From: David Shean notifications@github.com Date: Wed, Sep 11, 2013 at 4:53 PM Subject: Re: [StereoPipeline] Reduce the size of DEMs in ASP (#63)

Sounds good Oleg. The PC and DEM files were always the space hogs. DRG and DEMError should also display similar compression ratios.

I agree with Ben, but we need to be careful with hardcoding precision - 1 mm doesn't matter for orbital imagery, but could matter for close-range photogrammetry applications. Not sure if that is still a supported ASP use case.

oleg-alexandrov commented 11 years ago

I would not touch the DRG for now, that one is not in units of meters, assume the user would not want any rounding there.

dshean commented 11 years ago

I agree, we don't want rounding, but the float predictor should help?

On Sep 12, 2013, at 1:46 PM, Oleg Alexandrov notifications@github.com wrote:

I would not touch the DRG for now, that one is not in units of meters, assume the user would not want any rounding there.

— Reply to this email directly or view it on GitHub.

oleg-alexandrov commented 11 years ago

It should help. The predictor is enabled by default.

NeoGeographyToolkit / StereoPipeline

Reduce the size of DEMs in ASP #63