[BUG] CUDA results different to CPU results

MorganGrundy / MosaicMagnifique

Application for generating Photomosaics. Turn your images into beautiful Photomosaics.

https://morgangrundy.github.io

GNU General Public License v3.0

14 stars 0 forks source link

[BUG] CUDA results different to CPU results #10

Closed MorganGrundy closed 3 years ago

MorganGrundy commented 3 years ago

Describe the bug Generating a Photomosaic with CUDA sometimes produces a different result to generating without CUDA. Some of the cells use a different library image.

To Reproduce Steps to reproduce the behavior:

Load provided image library (lib.mil - 315 images)
Set detail to 50%
Set mode to CIE76
Set repeat range to 20
Set repeat addition to 300
Load main image from one of following: {https://unsplash.com/photos/dGMcpbzcq1I, https://unsplash.com/photos/-qSVn7qAmQU, https://unsplash.com/photos/915UJQaxtrk}
Generate Photomosaic
Disable CUDA
Generate Photomosaic
Compare Photomosaic

Expected behavior Both the CUDA and non-CUDA Photomosaics should be identical as long as the settings are identical.

Context (Environment):

Mosaic Magnifique Version 3.1.2
OS: Windows 8.1, 64-bit
Using packaged binary for Mosaic Magnifique v3.1.1

Additional context So far have not found an example where no repeats are used, so the problem might be with repeats. Bug affects all modes, so unlikely to be a problem in the difference kernels.

MorganGrundy commented 3 years ago

Results unaffected by setting library batch size to 1. Results unaffected by setting batch size in CUDAPhotomosaicData::mallocData() to 1.

MorganGrundy commented 3 years ago

Replaced the add reduction with just a simple sum kernel, results were unaffected so the reduction is not the problem.

MorganGrundy commented 3 years ago

I have been doing some more testing, this time using extreme examples (such as tiny cells, tiny photomosaic, repeat range of 1, max repeat addition).

Originally I the assumption that the problem was with the CUDA code. After analysing these more extreme examples, the CUDA results actually make more sense than the CPU ones.

For example using: lib.mil, detail 100%, CIE76, cell size 10px, repeat range 6, repeat addition 100,000, main image https://unsplash.com/photos/dGMcpbzcq1I, width 60px, height 40px. CUDA results - All cell images are unique, still somewhat suitable, but becoming less suitable in similar image areas from left-to-right top-to-bottom. CPU results - Uses many repeats.

To further support this I then set the repeat range to 0 and generated on CPU. The result was identical to the first CPU result.

So clearly the CPU results are the source of the problem.

MorganGrundy commented 3 years ago

Added a test for the calculateRepeats kernel and did find and fix a bug in it (548a389473b628a7004ed7419afbdaa9a0a970d4), but results still not the same so still a problem with CPU results.

MorganGrundy commented 3 years ago

Setting GridUtility::PAD_GRID to 0 seems to fix the problem, now to figure out why.

MorganGrundy commented 3 years ago

Setting GridUtility::PAD_GRID to 1 also seems to work.

MorganGrundy commented 3 years ago

With 639223f2e92f5fd389c2ef29badd113f18b2a7f8 the results are matching, but more thorough testing needed.

MorganGrundy commented 3 years ago

Added tests for comparing best fits from CPU and CUDA results: 0be7a6d67d1b0b9b5f4d743cade85086a6666482 Tests use random data and sometimes fail.

MorganGrundy commented 3 years ago

e173d54271c515403abfb1b5699cce1b430ac65f Removed random tests and instead load actual images.

On each run only a few of the images fail out of 100, and it is random which fails.

Using actual images also has the benefit that I can actually analyse the resulting photomosaics. So far usually only one cell is different (most I have seen is two). The results suggest that the problem is likely with the CUDA results. The randomness would suggest the cause is likely improper memory management.

Should add two more tests that repeatedly generate identical photomosaics, which will hopefully confirm that the problem is with the CUDA results.

MorganGrundy commented 3 years ago

811f3ddb5ffb544c8fa6e254f4f5d869b8c6e2f1 New tests confirm that problem is with CUDA results.

MorganGrundy commented 3 years ago

I have again been experimenting with changing the CUDA batch size. The frequency of anomalous results appears proportional to batch size.

MorganGrundy commented 3 years ago

Cannot really tell what the connection is with batch size.

However noticed that when batch size was set to 1 the generator was actually faster than with max batch size.

Probably re-design CUDAPhotomosaicData and CUDAPhotomosaicGenerator. Only doing one cell at a time which will hopefully solve the anomalous results, improve performance, and mean the progress bar can be better updated.

MorganGrundy commented 3 years ago

Tests now passing: bce8db8df3085a4b1d577615e576654c5af79eef CUDAPhotomosaicGenerator can still be improved and tidied up. I want to add more extensive tests (for all modes and different detail levels).

With the tests passing it seems that the issue is likely fixed.

Library batch size is not currently being used, although it seems like it is not needed as much with the new code.

MorganGrundy commented 3 years ago

More tests: 8f73e3ee39357566c75e4ffbca12b25ed3f6bd99 Results now seem to match with all modes and detail 100% or 50%