5E-324 / Imagina

One of the fastest Mandelbrot set renderer & explorer.
GNU Affero General Public License v3.0
19 stars 0 forks source link

Potential issue on the multithreaded LA implementation #5

Closed hrkalona closed 8 months ago

hrkalona commented 8 months ago

this part of the code:

if (ThreadID == ThreadCount - 1) { LAI.StepLength = j - PeriodBegin; LAs[ThreadID].Append(LA); LAIs[ThreadID].Append(LAI_); }

should be changed so the last thread to add those data should be the one who has PeriodBegin != 0 (this variable should be initialized to 0) and the next thread (if any) must have have PeriodBegin == 0. If the last thread has PeriodBegin !=0 then this thread should add the data.

This needs to be added so that the actual last thread that found the period should be the one to finalize the data. This check should be added on the starter thread (thread 0) as well, so that the multithreaded code works even if you set the ThreadCount to 1.

Probably for this fix you need to add a new array, similar to Start, which stores the PeriodBegin for each thread, and to be initialized at -1, so you can do busy wait while the value is < 0.

5E-324 commented 8 months ago

Thanks for the suggestion. This function is experimental and is known to have many potential issues. However, since it has relatively low performance benefit, it's currently unused and its development is discontinued, so there's no plan to fix these issues. But if you have an implementation of the fix, feel free to submit a pull request.

hrkalona commented 8 months ago

In high period areas the LA calculation tends to be very slow, especially the first stage. That is why I was experimenting with your multithreaded code. I also sent you another potential issue/fix on the forums. If you think that both suggestions are ok I can raise a pull request.

If you also know any other unsolved issues let me know so I can debug them.

5E-324 commented 8 months ago

I replied to the message you sent me on the forums, and proposed a solution.

The problem with this function is mainly the handling of edge/corner cases. Here are some situations I can think of that are potentially problematic. Some of them may be duplicated or can be solved by the same fix. Further investigations are needed.

Some of these are more likely to occur when the orbit lacks of dips, which can happen if you zoom deeply into a Misiurewicz point, especially with the new version of period insertion, which I copied from the single threaded version in the last commit. (The old one will give up directly if no period is detected, which will make pixel computation slower) If you want to debug, consider including this kind of locations to your test cases.

Of course, not all of these are common or will cause serious issues, but without handling them, it will be less reliable than the single-threded version.

Is the single-threaded version slower than the high precision orbit computation? If not, then maybe it's better to let it run in parallel with the orbit computation. The multi-threaded version is intended for accelerating opening saved compressed references, if you don't need to do this, then maybe this function is also not needed.

Note: "dip" refers to the sudden drop of |Z|, it's what the function DetectPeriod() detects.

hrkalona commented 8 months ago

I will continue the discussion that I started on the forum here:

I had a scenario where a thread starting at j = 307 managed to detect a period at 1319, so I got an exception: Thread( 5) error 1319 >= 375. The next thread (which started at 369) managed to find a period at 375 so this is why we got the exception.

I added a breakpoint on the single threaded code and I captured some indexes between them that the single threaded code managed to detect a period:

306, 308, 317, 319, 322, 324, 329, 334, 336, 345, 347, 350, 352, 357, 362, 364, 373, 375

so theoretically the thread starting at 307 could find any of these but it failed. This made me think what would happen if instead of initializing a thread by doing a Step and incrementing j by +2, to not do a Step and increment by +1. I tested one work around:

LAInfo LA_ = LAInfo(reference.Ref[j]);

if (LA.DetectPeriod(reference.Ref[j + 1]) || j + 1 >= reference.RefIt) { j++; } else { LA = LA_.Step(reference.Ref[j + 1]); j += 2; }

which seemed to work but you said that DetectPeriod is not designed to work like that and it might create some other edge cases. Before I start working the other method you proposed, which requires a synchronization redesign what If I do this, LAInfo LA_ = LAInfo(reference.Ref[j]); LAInfo LA_2 = LAInfo(reference.Ref[j]); LA_2 = LA_2.Step(reference.Ref[j + 1]);

size_t j1 = j +1; size_t j2 + j +2;

and then do the loop and and check both of them:

LAInfo NewLA; bool PeriodDetected1 = LA_.Step(NewLA, reference.Ref[j1]);

LAInfo NewLA2; bool PeriodDetected2 = LA_2.Step(NewLA2, reference.Ref[j2]);

and basically break the loop when the first detection occurs, so the starting J will be either the value of j1 or j2

Does this makes any sense?

hrkalona commented 8 months ago

Actually I tested this and in some other cases it detected the period right away leading to other collisions. I am gonna test something else. I am still going to have two different indexes j1 and j2 and two different starting LAs (LA and LA2_) but I will run two independent starting search loops and let both of them try to detect a period. When they are done I need to check if the original detection method which start with +2 offset managed to detect a period within its search space: size_t Start = reference.RefIt ThreadID / ThreadCount; size_t End = reference.RefIt (ThreadID + 1) / ThreadCount;

if the original method found something and it is between start and end then we will use that, else we need to check the secondary detection that started with an offset of +1. If this detection is between Start and End then we might be ok.

In any case If a thread detects a period outside its search space maybe we need to stop this thread of doing anything else, and maybe let the previous one work its way out to cover the next thread's space as well. Given that the starter thread is implement in a way that it can work one a single threaded execution then if all threads fail then the starter thread can work alone.

hrkalona commented 8 months ago

Or we can skip the idea with two indexes and two LA_s and just add this check after the first loop of seach:

if (ThreadID == ThreadCount - 1) { Start[ThreadID] = j; } else if (j >= Begin && j < End) { Start[ThreadID] = j; } else { while (!Start[ThreadID + 1]); Start[ThreadID] = Start[ThreadID + 1]; return; }

If the last thread will not find a period then j will be >= reference.RefIt so we are covered. If any of the previous threads will not find anything, this value will be propagated up until thread 1, so thread 0 will do all the work. If any of the intermediate threads fails to find a period, it will just pass the next threads value to its predecessor and then stop. This value will either be an intermediate value or a value >= reference.RefIt, so its predecessor will do all the work up until that value.