Use windowed read/write in median_smoothing

Adrien-LUDWIG commented 1 year ago

See the issue description in this forum comment: https://community.opendronemap.org/t/post-processing-after-odm/16314/16?u=adrien-anton-ludwig

TL;DR: Median smoothing used windowing to go through the array but read it entirely in RAM. Now the full potential of windowing is exploited to read/write by chunks.

Tests

I tested it on a very small dataset of 18 images on an external HDD. Here are the results:

>>> median_smoothing(input, output, smoothing_iterations=1000, num_workers=16)
# Before
 [INFO] Completed smoothing to create /datasets/windowed_io/odm_meshing/tmp/mesh_dsm.tif in 0:00:30.055436
# After
 [INFO] Completed smoothing to create /datasets/windowed_io/odm_meshing/tmp/mesh_dsm.tif in 0:00:28.892439

I also used cksum to compare the resulting files, which were exactly the same.

Do not hesitate if you have ideas for further testing. :wink:

Adrien-LUDWIG commented 1 year ago

Sorry for this HUGE mistake. :sweat_smile: Tips to future me: re-reading with a fresh head works way better.

Adrien-LUDWIG commented 1 year ago

Okay, after further testing this morning I came upon errors when reading a big file:

ERROR 1: ZIPDecode:Decoding error at scanline 256, unknown compression method
ERROR 1: TIFFReadEncodedTile() failed.
ERROR 1: /datasets/whole_hdp_features/odm_meshing/tmp/tiles.tif, band 1: IReadBlock failed at X offset 1, Y offset 9: TIFFReadEncodedTile() failed.
ERROR 1: ZIPDecode:Decoding error at scanline 256, incorrect data check
ERROR 1: TIFFReadEncodedTile() failed.
ERROR 1: /datasets/whole_hdp_features/odm_meshing/tmp/tiles.tif, band 1: IReadBlock failed at X offset 0, Y offset 11: TIFFReadEncodedTile() failed.

The little dataset I used to test yesterday calls window_filter_2d only once. So, I suppose it reads the whole file at once.

I need to explore this further. If you have any idea, I'm all ears.

pierotofy commented 1 year ago

Since you're doing writes from parallel threads, perhaps there needs to be a mutex. I'm not sure if that would affect performance.

This would be a cool improvement.

Adrien-LUDWIG commented 1 year ago

Indeed, it works using a single worker. I will look into this.

Thanks! I was able to read a few hundred windows from this file yesterday and was sure it was with many workers... I thought I was starting to lose my mind. Now, that you point it out, I can see where I got mixed up.

Adrien-LUDWIG commented 1 year ago

The last commit fixes racing conditions while reading/writing using locks.

I could not test performance for now as I don't have any big project with the temporary meshing files. I guess I could test it on something else since it is not too specific. Edit: I tried to use an orthophoto but it has no nodatavals which causes an error.

My main concern about performance is that each read overlaps with the previous and next on by a few pixels (half the kernel size).

originlake commented 1 year ago

My main concern about performance is that each read overlaps with the previous and next on by a few pixels (half the kernel size).

The previous way did not alter the original pixel value until the whole process is done. Now each thread will read/write directly, you shouldn't use the input file to save the results during processing (later processing might read altered values hence giving incorrect results), instead, write to a new file and rename it after all processes are done. Parallel reading in read-only mode shouldn't be an issue, but writing should be sequential as there is overlapping.

Adrien-LUDWIG commented 1 year ago

Oh you're totally right! I remember thinking about the better way to manage the temporary files from one iteration to another... and the idea of making it in place came to my mind. I felt real smart at the time but now that you point it out it's obviously not a good idea. 😅

Should I keep the input file intact until the output is created ? I feel like this is the better way to go but it should be consistent with the rest of the pipeline, so I would like your opinion.

This implies having 2 temporary files when doing more than one iteration, thus taking up more disk space. But I suppose it's fine since it would be the equivalent of the input size which is far less than the individual tiles or the tiles.tmp.tif file. In my case for example:

individual tiles total: 1TB (500 tiles of 2 GB)
tiles.tmp.tif: 352 GB
tiles.tif (the input here): 7.2 GB

If we go this way, should the 2 temporary files be renamed between the iterations to always have a clear name indicating which corresponds to the latest pass?

Let me know what you think and have a nice weekend! 😉

originlake commented 1 year ago

I think keeping the input file intact is better, it also benefits when the process failed in the middle, guarantee the source data is only changed when the process finishes successfully. How to name the temporary files doesn't matter to me, they are just temporary files to store the data, similar to how we store the data in memory. Just make sure to clean up them at the end.

Adrien-LUDWIG commented 1 year ago

The last 2 commits avoid reading altered data.

The first commit was my first attempt but it doesn't work. I pushed to have feedback, if possible. I don't understand why but when using rasterio "r+" mode to open the file, its data is completely wrong once saved. It is even bigger than the original file. I can confirm that the file is well updated while opened, because the second temporary file is valid after many iterations of reading from and writing to it. What am I missing? Is this not what "r+" is for?

The second commit fixes it but I find the workaround quite ugly. I am opened to suggestions.

Adrien-LUDWIG commented 1 year ago

Hello again! 👋

I finally had the time to test the performance impact of this PR.

So, I used a dataset of 120 images with a resolution of 5472x3648. I kept the default parameters.

I stoped the process at the meshing step and timed the execution time of median_smoothing. Here are the results for different number of iterations:

Iterations	Without windowing	With windowing
1	10.107	6.558
10	58.68	56.48
100	537.55	538.51

Results are averaged on 10 runs for 1 iteration and 5 for 10 iterations.

For information, here are the sizes of the input and outputs:

File	Size (MB)
tiles.tif	74.25
mesh_dsm_1.tif	61.76
mesh_dsm_10.tif	45.65
mesh_dsm_100.tif	30.75

and the resolution of tiles.tif : 6031*6088

Once again, I use cksum to ensure the outputs were the same for both implementations.

Apart from the elegancy of the workaround described in my last comment, I think the PR is ready to be merged.

Do not hesitate to tell me if you have concerns. :wink:

pierotofy commented 1 year ago

Thanks @Adrien-LUDWIG 👍 I've run some tests on a smaller dataset, runtime is similar as you found out, except in my tests the original code runs about ~10-15% faster (which is what I would expect). This can be merged, but I'm trying to understand the use-case and tradeoff: was the goal to reduce memory usage? Did you have ODM crash (run out of memory) at the median smoothing step during processing?

Adrien-LUDWIG commented 1 year ago

Yes, exactly, ODM crashed at the median smoothing step because it tried to allocate to much RAM (352 Go for my dataset) whereas the rest of the processing didn't required more than 120 Go of RAM (well, 16 Go of RAM and ~100 Go of swap).

The previous computations on tiles.tif, using GDAL, had no problem handling it. It crashed only when trying to open it with rasterio.

OpenDroneMap / ODM

Use windowed read/write in median_smoothing #1674

Tests