Open DinkydauSet opened 2 years ago
Kalles Fraktaler does something like what Fractal eXtreme does. I looked at the code of Kalles Fraktaler. What is does is something like this (the details are probably not 100% correct but this is the idea):
Guessing pixels, like in the Mariani-SIlver-algorithm is not done at all. Guessing in Kalles Fraktaler means something much simpler: after 25% of the points have been calculated, and only the 1×1 sized points remain, they can be guessed if their diagonal neighbors have the same iterationcount. That means at least 25% of points are always calculated. This explains why Kalles Fraktaler (and Fractal eXtreme) can have both guessing and nice render progress visualisation: they use a less effective method for guessing.
I guess the sort operation is one of the things that makes Kalles Fraktaler slow. Especially with a large resolution, the sort operation does a lot of swapping of bytes in memory before the render can start. This cost is negligible for renders at great depths, which explains why Kalles Fraktaler has good performance at great depths, and not at low depths. I think Kalles Fraktaler is also slow because a mutex lock is used for every pixel.
I have tested the cost of using a mutex per pixel in my program by creating a mutex lock in calcPoint. For renders with many iterations per pixel, it makes not much difference. For renders with only few iterations per pixel, it can easily make the render 3 times slower. The overhead is definitely not negligible.
I see that using a mutex per pixel makes it easy to distribute work to threads, so I would like to use the same approach if it was not so slow. Maybe it's an idea to assign groups of, for example, 32 pixels, at the same time. Then the mutex would have to be used only 1/32 as many times. With a large enough group size, the cost can be made low enough.
create a work distribution function that gives work to do in batches of 128 points or maybe more. 32 is not enough. 128 makes the slowdown almost unnoticeable.
The task of the render function will be to start threads this way:
do n times:
request work, start a worker thread with the work.
It's part of the idea to NOT just start threads and let them request work themselves. With that approach, all the new threads will have to wait for each other anyway because of the mutex required by the work distribution function. If the initial render function requests the initial work for threads there will be guaranteed no waiting for the mutex while starting the threads, which I expect is faster.
This avoids creating thousands of threads, as the program does now. Creating thousands of threads is (apparently) pretty fast, but not creating them is even faster.
I think of a function that not only distributes work but also colors pixels, and that distributes work in a smart way. The worker threads have a very simple task: calculate points. That's all. The work distribution function chooses some points to be calculated first, over others. There must be al algorithm behind that choice, and the distribution function must keep track of everything necessary for that algorithm.
The distribution function also starts neccesary actions when a thread is done with its work. For example, if a thread is done with its work, that may mean that now a border of a tile is completely calculated. The necessary action is then to fill that tile. I mention "tile" because what I hope to achieve is to keep using the guessing principle behind the Mariani-Silver-algorithm, but maybe in a different way that allows for better render progress.
The work distribution function always knows when a thread is done, because threads request new work. (This may be a little more complicated with AVX because of the optimization in (4) above. A thread may already request work when it has not finished its current work.)
A signature idea:
void getWork(int thread_id, int last_done, int next_index, point* work_storage);
last_done is how many pieces of work have been finished by the thread. At first this will be 0. The next time the thread requests work it's 1. getWork can use this to cause actions. It can remember that when thread 1 is done with work piece 3, something has to be done.
next_index is the next index in the array of work (points) where the thread will continue. If a thread has finished all of its work, it will call getWork with next_index = 0.
I now have this unfinished idea:
The Mariani-Silver-algorithm as I have implemented it works by computing the boundary of a tile, and then either splitting the tile (if the boundary has different iteration counts) or filling it (if the whole boundary has the same iteration count). ExploreFractals does this recursively.
What I want to do is a "split early if possible" approach. My observation is that often not the entire boundary has to be computed to know whether the tile must be split or not. You can also first compute the 4 corners of a tile. If the corners don't have the same iteration count, it's already known that the tile needs to be split. The splitting can be done first, before finishing the entire boundary. Splitting first creates extra tiles very quickly, with only few calculated pixels. The benefit is that those new tiles will have new boundaries with new corners, and those corner points will quickly appear all over the screen. By computing them first, and coloring the corresponding pixels, render progress is more evenly spread over the screen.
At least that's what will usually happen. If the 4 corners do have the same iteration count, it's still unknown whether the tile must be split. The next step may be to calculate 4 more pixels on the boundary. If they have different iteration counts, the tile can be split. Otherwise, still more points must be calculated... this goes on until either the tile gets split or the tile can be filled because the whole boundary was computed and it has the same iteration count everywhere.
A tile is a rectangle that has 4 sides. Consider only one side of the rectangle. The side includes 2 corner points that will be calculated first. I have this idea for the order in which points are calculated:
1___________________________________________________________1
______________________________2______________________________
_______________3_____________________________3_______________
_______4______________4______________4______________4________
___5_______5______5_______5______5_______5______5_______5____
_6___6___6___6__6___6___6___6__6___6___6___6__6___6___6___6__
7_7_7_7_7_7_7_77_7_7_7_7_7_7_77_7_7_7_7_7_7_77_7_7_7_7_7_7_7_
I call the computation of the corners level 1. Level 2 is the points between the corners. Level 3 is the points between those points.
Things to figure out:
I think of a tile data structure that contains information about which levels of each of the sides of the tile have been calculated:
struct tile {
int topleft_x;
int topleft_y;
int width;
int height;
int top_level;
int bottom_level;
int right_level;
int left_level;
};
Some of those ints may be shorts or chars to save memory.
This python script generates the order visualisation above but in a better way (without points occurring in multiple levels):
low = 1
high = 61
to_level = 8
def row_to_string(low, high, row, symbol="X"):
presentation = ["_"] *(high - low + 1)
for ind in row:
presentation[ind - low] = str(symbol)
ret = ""
for elt in presentation:
ret += elt
return ret
def level_recursive_upto(low, high, n):
upto = [low, high]
if n == 1:
return upto
def recursion(low, high, n):
mid = (low + high) / 2
upto.append(mid)
if n > 2:
if mid > low + 1:
recursion(low, mid, n-1)
if high > mid + 1:
recursion(mid, high, n-1)
recursion(low, high, n)
return upto
# n >= 1
# high > low
def level_recursive(low, high, n):
if n == 1:
return [low, high]
upto = sorted(level_recursive_upto(low, high, n-1))
lvl = []
for i in range(1, len(upto)):
e1 = upto[i-1]
e2 = upto[i]
if e2 > e1 + 1:
lvl.append( (e1 + e2)/2 )
return lvl
for i in range(1, to_level):
row = level_recursive(low, high, i)
print row_to_string(low, high, row, i)
output:
1___________________________________________________________1
______________________________2______________________________
_______________3_____________________________3_______________
_______4______________4______________4______________4________
___5_______5______5_______5______5_______5______5_______5____
_6___6___6___6__6___6___6___6__6___6___6___6__6___6___6___6__
__7_7_7_7_7_7_7__7_7_7_7_7_7_7__7_7_7_7_7_7_7__7_7_7_7_7_7_7_
I wonder if it's fast enough to do this recursive calculation every time the points are needed.
A benefit of the python script above is that the order in which points are calculated remains the same if a tile is split. Because of the recursion, the position of the 3's depends only on the position of the first 1 and the 2 (for the first 3) and the 2 and the second 1 (for the second 3). Splitting the tile at the location of the 2 creates two new tiles of which the second levels are exactly the positions of those 3's. More generally: all levels of the smaller tiles overlap with the levels of the larger tile. This is a solution to the problem of calculating points multiple times. It won't happen. The levels stored in the tile struct can be safely reused after splitting a tile.
A thread may already request work when it has not finished its current work.
I think it may be better to do it differently. The AVX calculation is kinda like having 4 threads. 4 points are being iterated at the same time, with one instruction. Every index in the AVX vector can have its own work storage, just like very thread has its own work storage. This avoids the whole problem. When there is no new work available, work from other indices' storage can be taken.
Maybe it will make the work distribution less effective, but that's because of a problem that's always there, no matter how many threads there are. Especially when a render has just started, there may not be enough work available immediately for all threads. The algorithm starts with just 1 tile. First the 4 corners have to be calculated. The next step depends on the result of those 4 points. How to distribute the calculation of 4 points to many different threads? No more than 4 threads can be used at all and the overhead of using only 1 point in the buffer is significant, as I've measured.
Maybe it's better to let all the threads do some work, even if it's not yet known to be necessary. For example, the first tile can be calculated up to some level immediately, instead of just calculating the 4 corners, just to keep the threads busy, or the first tile can be split regardless of whether it's necessary.
If there's a situation where there really is not enough work right now, but there may be more work later, the threads should not end. This may be a problem because if the distribution function returns no new work, that's currently a signal that the thread can end. Either the distribution function or the threads would have to wait somehow. I want to avoid waiting if possible.
I looked at the C++ feature coroutines because I think that could be useful. I want something like this:
point nextPoint() {
for (int i=0; i<number_of_points; i++) {
point p;
p.x = ...;
p.y = ...;
return_resumable p;
}
}
I want a function that can return a value, and later resume from where it was left. That's because it may be easy to create a loop that generates all points to be calculated, but I want to return them one by one.
Unfortunately c++ coroutines are very complicated and other people complain about them too. Also they're only available since c++20 and I have visual studio 2017.
The same thing can maybe be achieved with a state struct passed by reference. The state struct is kept as a Render class member, so it's global for the whole render.
struct state_struct {
//variables that contain the state of the distribution function, so that it can resume later
};
If the distribution function accepts a state_struct&, then it can be used like this from a thread:
bool new_work = getWork(state);
This is not as general as actually resuming function execution so I don't know... coroutines really sound like exactly what I need here.
Another idea is to use a list instead. If the distribution function can't be resumed, it can at least make a list of the next important work, like this:
vector<point> nextwork
to store the next work that will be assignedI'm thinking about the idea of keeping the program in a consistent state that I described here: https://dinkydauset.github.io/DinkydauSet/2021/12/21/reducing-software-complexity.html
The implementation of the work distribution function can maybe be made easier by thinking of it as doing 2 tasks that keep the state of the function consistent:
These 2 tasks can be done independently. Both tasks must result in a consistent state. It's not strictly necessary for an implementation of the work distribution to have the state be consistent between (1) and (2). This is just to make the implementation easier. It reduces the problem to finding solutions for 2 questions:
This leads to questions about the state:
The work distribution function can keep a list of facts about what the threads are doing. This list of facts must include those and only those facts necessary to do the 2 tasks. like this:
//information related to each thread
class threadstate;
vector<threadstate>; //one threadstate for each thread
This can be part of the greater state class for the whole work distribution:
struct state_struct {
vector<threadstate>;
...
}
Because the distribution function needs the state all the time, I might as well make it a member function and change the name:
class WorkDistribution {
class threadstate {
...
}
vector<threadstate>;
getWork(int thread_id)
}
What is the information that I need to store?
Possible ideas:
It distributes work with the intention of
I think of using a priority queue to do work on tiles in an order that distributes progress well. The algorithm in kalles fraktaler also has some kind of tiles. It first calculates 1 point for every grid of 8×8 points, which is like a tile. The order in which it calculates tiles is from large to small. With a priority queue, I could choose to do work on large tiles first, but a better way is maybe to let the order depend on the level as well, like this:
tile1 > tile 2 if tile1.size / tile1.level > tile2.size / tile2.level
This makes the order depend on the distance between calculated pixels on the sides of the tile. In this expression I use tile1.level, but there is no single level for the whole tile. Each side has its own level. To do work on sides with the largest distance between calculated points, the minimum of the levels of a tile needs to count, so the ordering can be done like this:
tile1 > tile2 if tile1.size / tile1.minimumLevel() > tile2.size / tile2.minimumLevel()
The above places focus on sides rather than tiles for the choice what to do first. This means it's better to store sides in the priority queue rather than tiles. This may be especially a good idea because calculating a side can affect 2 tiles: most sides are the side of 2 tiles at the same time.
A more concrete idea for the algorithm:
Problem with (5): if a tile is split, two of its sides have to be split. How to find the "other" side in the priority queue and update it?
A possible side class:
template<bool horizontal>
class side {
uint fixed_dimension;
uint from;
uint to;
uint level;
uint tile1;
uint tile2; //may be 0 to indicate that the side belongs to only 1 tile
}
To identity files I can choose id-numbers for the tiles. I don't know how to do that yet.
If all sides of a tile have been calculated up to level 2, and the tile is split, there is no need to calculate the corners of the new tiles, because those corners are level 2 points of the sides of the old tile. At least that is true if tiles are always split in halves.
I don't like the idea that worker threads calculate points and then the work distribution function needs to check whether a tile must be split by looking at the iteration counts of calculated points again. While working, the worker thread already knows the iteration counts.
If a thread requests at most 256 points of work to do, and the next level of the most important side is 257 points, what to do? Not all of the work can be assigned at once. A solution is to keep a buffer of work that should be distributed as soon as possible. 256 out of 257 points can be assigned to the thread immediately. The last point can be placed in the buffer and assigned to the next thread requesting work.
This creates the need to keep track of which threads are working on which sides. Sides that are being worked on should not be in the priority queue. For each of those sides, the side should be kept somewhere else until all the points belonging to the next level of the side are calculated. Only then it can be determined what to do next (split the tiles, reinsert it in the priority queue etc.) so there needs to be a list of sides that are currently being worked on.
Maybe it's a good idea to choose id numbers for the sides as well, so they can be kept in an unordered map.
template<bool horizontal>
class side {
uint fixed_dimension;
uint from;
uint to;
uint level;
uint tile1;
uint tile2; //may be 0 to indicate that the side belongs to only 1 tile
int id; //to identify tiles, to save information about which side points in the buffer belong to
}
An id number can be 0 for the first side created, and increasing numbers for all new sides.
There is a problem left. If a side is calculated up to a level and the tile needs to be split, that also affects the opposite side of the tile. That side also needs to be split. Therefore it must be possible to find that opposite side and remove it from the priority queue... or not. It could stay in the priority queue. I need to remove one side and insert two new sides, so another way to accomplish that is changing the existing side and inserting one new side. There's a new problem with that because changing a side could change its priority. If that happens, the priority queue is in an invalid state because the location of the side was based on a different priority than it has now. I don't know if the priority queue keeps working in such an invalid state and it could be different for each compiler because it depends on the implementation of priority queue.
If I ignore that problem for a moment, a way to change a side while it remains in the priority queue is by making a priority queue of only id-numbers of sides. Then I don't have to search for an element in the priority queue. I can keep the actual sides in an unordered map and change the side there. The id can stay in the priority queue.
priority queue of id numbers; unordered map for extra information
faster way to achieve the same:
priority queue of indices; vector for extra information
Basically this: https://stackoverflow.com/a/3076722/10336025
This answer adds a nice idea to what I was thinking: do something to mark an element in the vector as invalid if it's not supposed to be used anymore (in my case: at the index is a side that I don't want to calculate anymore). When the priority queue returns the index, check if the element is marked as invalid. If so, mark the index as free again and ask for a new index from the priority queue. This solves the problem of not being able to remove elements from the priority queue, by not having to remove them.
This works:
#include <queue>
#include <vector>
#include <iostream>
using namespace std;
class side {
public:
int level;
int length;
};
class SideComparator {
public:
vector<side>& sides;
SideComparator(vector<side>& sides) : sides(sides) {}
bool operator() (int s1_idx, int s2_idx) {
side s1 = sides[s1_idx];
side s2 = sides[s2_idx];
return s1.length / s1.level < s2.length / s2.level;
}
};
int main() {
vector<side> sides;
sides.push_back({2, 100});
sides.push_back({3, 55});
sides.push_back({2, 120});
sides.push_back({3, 45});
priority_queue<int, vector<int>, SideComparator> side_queue{ SideComparator(sides) };
for (int i=0; i<sides.size(); i++) {
side_queue.push(i);
}
while ( ! side_queue.empty()) {
auto top = side_queue.top();
side_queue.pop();
cout << top << " fraction " << sides[top].length / sides[top].level << endl;
}
return 0;
}
output:
2 fraction 60
0 fraction 50
1 fraction 18
3 fraction 15
2 is larger than 0 with the ordering by SideComparator. 2 is larger than 0 in this situation means that the side at index 2 in a vector counts as larger than the side at index 0, and what counts as larger is determined by the calculation s1.length / s1.level < s2.length / s2.level
.
Translation of the python script to c++:
#include <vector>
#include <cmath>
#include <algorithm>
#include <iostream>
using namespace std;
int level_size_upto(int lvl) {
return pow(2, lvl-1) + 1;
}
int level_size(int lvl) {
return pow(2, lvl-2);
}
void recursion(vector<uint>& result, uint from, uint to, int lvl)
{
uint mid = (from + to)/2;
result.push_back(mid);
if (lvl > 2) {
if (mid > from + 1)
recursion(result, from, mid, lvl-1);
if (to > mid + 1)
recursion(result, mid, to, lvl-1);
}
}
vector<uint> level_upto(uint from, uint to, int lvl)
{
if (lvl == 1)
return {from, to};
vector<uint> result{ from, to };
result.reserve( level_size_upto(lvl) );
recursion(result, from, to, lvl);
return result;
}
vector<uint> level(uint from, uint to, int lvl)
{
if (lvl == 1)
return {from, to};
vector<uint> result;
result.reserve( level_size(lvl) );
vector<uint> upto = level_upto(from, to, lvl-1);
sort(upto.begin(), upto.end());
for (int i=1; i<upto.size(); i++) {
uint e1 = upto[i-1];
uint e2 = upto[i];
if (e2 > e1 + 1) {
result.emplace_back( (e1+e2)/2 );
}
}
return result;
}
int main()
{
uint from = 1;
uint to = 61;
//expected:
//[1, 2, 4, 6, 8, 10, 12, 14, 16, 17, 19, 21
//, 23, 25, 27, 29, 31, 32, 34, 36, 38, 40, 42
//, 44, 46, 47, 49, 51, 53, 55, 57, 59, 61]
vector<uint> result = level_upto(from, to, 6);
sort(result.begin(), result.end());
cout << "[";
for (int i=0; i<result.size(); i++) {
cout << result[i] << ",";
}
cout << "]" << endl;
//expected:
//[2, 6, 10, 14, 17, 21, 25, 29, 32, 36, 40, 44, 47, 51, 55, 59]
vector<uint> result2 = level(from, to, 6);
//sort(result2.begin(), result2.end()); //sort is unnecessary
cout << "[";
for (int i=0; i<result2.size(); i++) {
cout << result2[i] << ",";
}
cout << "]";
}
output:
[1,2,4,6,8,10,12,14,16,17,19,21,23,25,27,29,31,32,34,36,38,40,42,44,46,47,49,51,53,55,57,59,61,]
[2,6,10,14,17,21,25,29,32,36,40,44,47,51,55,59,]
The from- and to-points of sides have overlap, at the corners of tiles. Solution: do not consider the from-and to-points of horizontal sides. Sides like this don't have overlap:
This means that the topleftcorner of a tile, by definition, always belongs to the left vertical side, and not the top horizontal side.
It also means that the calculation of horizontal sides should start at level 2.
If level 1 of a side has the same iteration count of 1, and level 2 has a different iteration count of 2, it's pointless to split the tile in halves. The 2 resulting tiles would have corners with a different iteration count, so inevitably the tiles would have to be split again. With the known information, this is a better way of splitting:
Remark: The left endpoints of the new sides are not calculated yet. The right endpoints may have been calculated already - that depends on the progress of the right side of the tile. The level of the new horizontal side may therefore not be exactly 0 or 1. 0 means no points at all, and 1 means both endpoints. The same problem exists for the new vertical sides. Some endpoints are calculated, and some are not. This can be handled as a special case.
There is a worse problem with higher levels. This situation can occur:
The algorithm I have in mind does this: the top pixel of the left side of the tile has iteration count 1. The next calculated pixel has iteration count 2, which is different, so the tile must be split between those points. The next calculated pixel has iteration count 1, which is different from 2, so it needs to be split there too. From there, only iteration count 1 occurs, so no more splitting is needed.
The red side that this process creates has 4 points that have already been calculated. This does not correspond to a level. The problem happens with every level, so it's difficult and possibly too slow to handle all of those as special cases. Worse: it's not even necessarily a partially calculated level, as this example shows:
>>> row_to_string(low, high, level_recursive_upto(low, high, 5))
'X__X___X___X___X___X___X___X___X__X___X___X___X___X___X___X___X'
>>> row_to_string(low, high, level_recursive_upto(low, high, 5))[7:-24]
'X___X___X___X___X___X___X__X___X'
>>> substr = row_to_string(low, high, level_recursive_upto(low, high, 5))[7:-24]
>>> len(substr)
32
>>> row_to_string(1, 32, level_recursive_upto(1, 32, 4))
'X__X___X___X___X___X___X___X___X'
The problem is these strings are different:
'X___X___X___X___X___X___X__X___X'
'X__X___X___X___X___X___X___X___X'
This shows an example side of length 63 where the marked points belong to a level up to 5. I take a part of the side of length 32, and then calculate the levels of a side of length 32 up to 4. The same number of points are marked with an X, but the X-es are at different positions, so taking any part of a side doesn't guarantee that the calculated pixels always belong to a level.
A solution for the problem is to split the side more times. It can always be split in halves, and those halves can be split in halves, until (if necessary) only sides without calculated points between their endpoints remain. That avoids the problem, at the cost of increasing the number of sides more quickly.
A solution for the problem is to split the side more times. It can always be split in halves, and those halves can be split in halves, until (if necessary) only sides without calculated points between their endpoints remain. That avoids the problem, at the cost of increasing the number of sides more quickly.
I don't like that. I thought about what I really want to achieve. I want sides to be calculated in a somewhat random order. The idea of "corners first" is not a requirement. I just want to calculate some pixels on a side to get a first idea of whether the tile should be split.
The problem then is how to do that in such a way that it's possible to remember the progress of a side after splitting. After splitting a side, I want to know for each of the 2 new sides which points have been calculated and which have not. That's why I thought of the level idea. By always choosing midpoints, by definition, if the side is split in halves, the two halves have 1 level lower than the large side.
A level illustration:
// lvl side lvl size size of all lvls up to lvl
// 1 *-------------* 2 2
// 2 -------*------- 1 3
// 3 ---*-------*--- 2 5
// 4 -*---*---*---*- 4 9
// 5 ... 8 = 2^(lvl-2) 17 = 2^(lvl-1) + 1
The problem that I ran into with this approach is that a side can only be split at 1 specific point (the midpoint). In order to split a side at another point, it has to be split multiple times.
The ideal level-kind-of method to choose pixels of a side to calculate has the property that the side can be split anywhere.
A possible way to do this: first calculate points with a coordinate that's 0 mod 5, then points that are 1 mod 5, 2 mod 5, ... until all points are calculated. The mod 5-value of a coordinate doesn't depend on the side, just the coordinate, so a side can be split anywhere.
The number 5 is too low. In reality it should be more like 256, because when I render at a resolution of say 1200×800, I don't want 1/5 of the points on the sides of the initial tile to be calculated immediately. On the other hand, 256 is too large for small sides. Calculating points that are 1 mod 256, then 2 mod 256, then 3 mod 256, ... for a side that is itself smaller than 256 is the same as just calculating all the points of the side from small to large.
I could solve the problem with 256 being too large by choosing a random order for the moduli 1, 2, 3, .... and store those in a const array. An order similar to that of the levels should work, so I can use the level idea to order the moduli. Here's a Python function that does that:
def generatemoduli():
res = []
div = 1
while div <= 256:
last = 256/div
res.append(last)
for j in range(div/2 - 1):
last += 256/(div/2)
res.append(last)
div *= 2
return res
result:
>>> generatemoduli()
[256, 128, 64, 192, 32, 96, 160, 224, 16, 48, 80, 112, 144, 176, 208, 240, 8, 24, 40, 56, 72, 88, 104, 120, 136, 152, 168, 184, 200, 216, 232, 248, 4, 12, 20, 28, 36, 44, 52, 60, 68, 76, 84, 92, 100, 108, 116, 124, 132, 140, 148, 156, 164, 172, 180, 188, 196, 204, 212, 220, 228, 236, 244, 252, 2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62, 66, 70, 74, 78, 82, 86, 90, 94, 98, 102, 106, 110, 114, 118, 122, 126, 130, 134, 138, 142, 146, 150, 154, 158, 162, 166, 170, 174, 178, 182, 186, 190, 194, 198, 202, 206, 210, 214, 218, 222, 226, 230, 234, 238, 242, 246, 250, 254, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 255]
>>> sorted(generatemoduli()) == range(1, 257)
True
When sorted, it's the numbers 1, ..., 256 again.
Another possible way to do it is this. (This is not my own idea.) Use that (Z/257Z)* = Z/256Z, because 257 is prime (a result from ring theory). For a suitable number, such as 27, the numbers 1 though 256 can be generated in a wild order by starting with 1 and multiplying by 27 repeatedly, while calculating modulo 257.
def generatemoduli2():
n = 1
res = []
for i in range(256):
n *= 27
n = n % 257
res.append(n)
return res
result:
>>> generatemoduli2()
[27, 215, 151, 222, 83, 185, 112, 197, 179, 207, 192, 44, 160, 208, 219, 2, 54, 173, 45, 187, 166, 113, 224, 137, 101, 157, 127, 88, 63, 159, 181, 4, 108, 89, 90, 117, 75, 226, 191, 17, 202, 57, 254, 176, 126, 61, 105, 8, 216, 178, 180, 234, 150, 195, 125, 34, 147, 114, 251, 95, 252, 122, 210, 16, 175, 99, 103, 211, 43, 133, 250, 68, 37, 228, 245, 190, 247, 244, 163, 32, 93, 198, 206, 165, 86, 9, 243, 136, 74, 199, 233, 123, 237, 231, 69, 64, 186, 139, 155, 73, 172, 18, 229, 15, 148, 141, 209, 246, 217, 205, 138, 128, 115, 21, 53, 146, 87, 36, 201, 30, 39, 25, 161, 235, 177, 153, 19, 256, 230, 42, 106, 35, 174, 72, 145, 60, 78, 50, 65, 213, 97, 49, 38, 255, 203, 84, 212, 70, 91, 144, 33, 120, 156, 100, 130, 169, 194, 98, 76, 253, 149, 168, 167, 140, 182, 31, 66, 240, 55, 200, 3, 81, 131, 196, 152, 249, 41, 79, 77, 23, 107, 62, 132, 223, 110, 143, 6, 162, 5, 135, 47, 241, 82, 158, 154, 46, 214, 124, 7, 189, 220, 29, 12, 67, 10, 13, 94, 225, 164, 59, 51, 92, 171, 248, 14, 121, 183, 58, 24, 134, 20, 26, 188, 193, 71, 118, 102, 184, 85, 239, 28, 242, 109, 116, 48, 11, 40, 52, 119, 129, 142, 236, 204, 111, 170, 221, 56, 227, 218, 232, 96, 22, 80, 104, 238, 1]
>>> sorted(generatemoduli2()) == range(1, 257)
True
I now wonder what's the best idea. The second idea is very easy to implement. All that needs to be remembered is the last used modulus. The next one is always (the previous one) * 27 mod 257. Maybe I won't even have to store those values in an array. I also wonder what's the best number to use. 27 works but there are phi(256) = 128 possible numbers that have this effect.
This function generates remainders (moduli):
constexpr uint generator = 27;
inline constexpr uint nextRemainder(uint remainder) {
return (remainder * generator) % 257;
}
Sides now remember their next remainder instead of their level:
struct side
{
...
int next_remainder;
int next_tile_split_remainder;
Choosing the next points of a side to calculate works as follows: the next remainder is calculated, and all points having that remainder will be chosen. A remainder is therefore like a level (as in, a set of points in the side). The number of points per remainder may, however, be small or even 0. A small side (smaller than 256 points) will have 0 points that have a certain remainder, for some remainders.
When a side is (partially) calculated and not all points have the same iteration count, the tile(s) adjacent to the side must be split, like this:
The tile needs to be split between every pair of consecutive calculated points that have a different iteration count. That means I need to know which points have been calculated, to check for that.
The next remainder can be used to regenerate all the calculated points. The next remainder is enough information to find which points of the side have been calculated. It's just a matter of generating the points again, starting with the first remainder (There needs to be a first remainder; I choose 27 here, the same as the generator.)
For the purpose of splitting a tile, I need the calculated points sorted. That can be accomplished in by either
An idea of how the sorting of remainders can work:
constexpr array<uint, 256> remainders = []()
{
array<uint, 256> result{};
uint last_remainder = generator;
for (int i=0; i<256; i++) {
last_remainder = nextRemainder(last_remainder);
result[i] = last_remainder;
}
return result;
}();
void sortRemainders(array<uint, 256>& remainders, int upto_idx) {
sort(remainders.begin(), remainders.begin() + upto_idx);
}
void sortRemainders(array<uint, 256>& remainders, int upto_remainder) {
int i = 0;
while(true) {
if (remainders[i] == upto_remainder) {
sortRemainders(remainders, i);
break;
}
i++;
assert(i <= 257); //check that this loop doesn't continue forever
}
}
I created that code but I don't think I will use it. I guess the first method is faster. Which is faster depends on:
I wonder if there's a way to avoid the sorting, but first I want to have something that works to find out if this is already fast enough.
Generating points up to a remainder:
// not including to_remainder
vector<uint> calculated_upto(uint from, uint to, int to_remainder)
{
assert(to_remainder >= 0);
assert(to_remainder <= 255);
vector<uint> result;
int remainder = first_remainder;
while (remainder != to_remainder)
{
//find the first point after from with this remainder
uint to_add = remainder + (from/256)*256;
if (to_add < from) to_add += 256;
//add all points after from with this remainder that also belong to the side (less than to)
while (to_add < to) {
result.push_back(to_add);
to_add += 256;
}
remainder = nextRemainder(remainder);
}
return result;
}
I'm going to do this differently:
The from- and to-points of sides have overlap, at the corners of tiles. Solution: do not consider the from-and to-points of horizontal sides. Sides like this don't have overlap:
This means that the topleftcorner of a tile, by definition, always belongs to the left vertical side, and not the top horizontal side.
It also means that the calculation of horizontal sides should start at level 2.
The idea still has overlapping points for vertical tiles, unless the vertical sides include the corner point their tile only some of the time, but not all the time.
If I have to consider the case where the corner point of a tile may belong to a side of another tile, I might as well relax the requirements on sides overall.
Idea: all points should belong to exactly one side. The corner points of tiles belong to a not specified side. This means that it's (maybe) possible to get something like this:
This is a tile with sides such that none of its corner points are in those sides. That's not a problem. The corner points will belong to some other tiles, so they will be calculated. It also doesn't really affect guessing. If a tile has the same iteration count at its sides, except for the corner points, then it can still be safely guessed.
To split a tile, several things need to happen:
That can be done like this:
First, the orange side is added, then the yellow one. The existing sides of the tile are split, and a choice has to be made: the split location either belongs to the left or right part (for horizontal sides) or the top or bottom part (for vertical sides). Here I chose right and bottom.
This kind of splitting means that the following is impossible, even though it would be allowed by the side datastructure:
This is a tile where one point of the top side of the tile belongs to a vertical side. That's impossible because the way of splitting never changes the direction of the side of a point. Points either don't belong to a side yet, or they do belong to a side, and those existing sides can only be split, retaining the direction. Every side of a tile starts out as 1 piece, so all points in the side of a tile have the same direction.
It's important that this is impossible, because it means that sides are like chains. Multiple horizontal or vertical sides can be in a chain. When a side does not end at the border of the fractal, there is a next side to the right (if horizontal) or below (if vertical).
By keeping the index of the next side in the data of a side, the chain can be followed in a loop, going from one side to the next repeatedly.
The linked list structure of sides means that a tile is uniquely defined by only 4 integers: the indices of the sides at its corners:
struct tile
{
// stored indices of sides, from corners:
//
// --->
// | |-----------| |
// v | | v
// | |
// | |
// |-----------|
// --->
//
//
int side_topleft_toright;
int side_topleft_down;
int side_topright_down;
int side_bottomleft_toright;
}
Everything else can be derived. For example, the width and height can be derived from the from- and to points of the sides:
inline uint tileWidth(tile t) {
return
sides[t.side_topright_down].vertical_xpos()
- sides[t.side_topleft_down].vertical_xpos()
+ 1;
}
inline uint tileHeight(tile t) {
return
sides[t.side_bottomleft_toright].horizontal_ypos()
- sides[t.side_topleft_toright].horizontal_ypos()
+ 1;
}
Also, even though only the first sides going out of some corner point are saved, the next sides are stored and can be found by following the chain.
Splitting a tile comes down to:
Step 4 has to be done last because it's impossible to set the tiles of new sides when they are created because the tiles have not been created yet. This is a chicken and egg problem. To solve it, the tiles would have to be created first, but then the sides of those tiles can't be set yet because the sides don't exist yet. The solution is the same problem again. The only way to solve it is by creating either the tiles or sides in multiple phases.
To find the right existing side to split, the chain must be followed. Let's say we have a location where we want to split the bottom tile in this situation:
The split location is in the third side in the chain, from the topleft corner to the right.
There are also some exceptional situations:
I created a new "fractal type" called split test, which renders a test visualization of the tile splitting. It doesn't calculate any fractal formula. It just shows all the sides.
void renderTileSplitTest()
{
WorkDistribution test(canvas, 0);
test.splitTile(0, direction::vertical, 7);
//set default values
for (uint x=0; x<canvas.P().width_canvas(); x++) {
for (uint y=0; y<canvas.P().height_canvas(); y++) {
canvas.setPixel(x, y, 0, false, false);
}
}
//show sides in a different color to see which sides there are
for (side s : test.sides) {
if (s.status != side_status::neutral) {
continue;
}
for (uint pos = s.from; pos <= s.to; pos++)
{
point p;
if (s.direction == direction::horizontal) {
p.x = pos;
p.y = s.horizontal_ypos();
}
else {
p.x = s.vertical_xpos();
p.y = pos;
}
//if this assert fails, the point was already set, which means it is part of multiple sides. That should not happen.
assert(canvas.getIterationcount(p.x, p.y) == 0);
//color based on the index of the s, gives each side a different (hopefully) color
canvas.setPixel(p.x, p.y, s.idx + 1, false, false);
}
}
//shows the result on the screen
canvas.renderBitmapFull(false, true, canvas.lastBitmapRenderID);
}
Another test:
I first divide the whole tile in 2 parts (left and right), and then split those 2 parts in different ways:
the right part: vertical first, then horizontal
the left part: horizontal first, then vertical
The circles in the screenshot show that the order of splitting affects to which side the corner points of tiles belong. This is the behavior that I designed.
void renderTileSplitTest()
{
WorkDistribution test(canvas, 0);
auto midpoint = [&test](direction direction_, tile t) {
if (direction_ == direction::horizontal) {
return (test.tile_from_x(t) + test.tile_to_x(t)) / 2;
}
else {
return (test.tile_from_y(t) + test.tile_to_y(t)) / 2;
}
};
auto split_midpoint = [&test, midpoint](direction direction_, int tile_idx)
{
tile t = test.tiles[tile_idx];
if (direction_ == direction::horizontal) {
uint midpoint_y = midpoint(direction::vertical, t);
return test.splitTile(tile_idx, direction_, midpoint_y);
}
else {
uint midpoint_x = midpoint(direction::horizontal, t);
return test.splitTile(tile_idx, direction_, midpoint_x);
}
};
int main_tile_idx = 0;
auto indices = split_midpoint(direction::vertical, main_tile_idx);
{
auto indices2 = split_midpoint(direction::horizontal, indices.idx_1);
split_midpoint(direction::vertical, indices2.idx_1);
split_midpoint(direction::vertical, indices2.idx_2);
}
{
auto indices2 = split_midpoint(direction::vertical, indices.idx_2);
split_midpoint(direction::horizontal, indices2.idx_1);
split_midpoint(direction::horizontal, indices2.idx_2);
}
//set default values (all marked as minibrot so that only sides are shown in a color other than black)
for (uint x=0; x<canvas.P().width_canvas(); x++) {
for (uint y=0; y<canvas.P().height_canvas(); y++) {
canvas.setPixel(x, y, 0, false, true);
}
}
//show sides in a different color to see which sides there are
for (side s : test.sides) {
if (s.status != side_status::neutral) {
continue;
}
for (uint pos = s.from; pos <= s.to; pos++)
{
point p;
if (s.direction == direction::horizontal) {
p.x = pos;
p.y = s.horizontal_ypos();
}
else {
p.x = s.vertical_xpos();
p.y = pos;
}
//if this assert fails, the point was already set, which means it is part of multiple sides. That should not happen.
assert(canvas.getIterationcount(p.x, p.y) == 0);
//color based on the index of the s, gives each side a different (hopefully) color
canvas.setPixel(p.x, p.y, s.idx + 1, false, false);
}
}
//shows the result on the screen
canvas.renderBitmapFull(false, true, canvas.lastBitmapRenderID);
}
How to determine the actual split locations of a tile? After a side is partially or fully calculated, differences in iteration count mean that the tile should be split, but I don't want tiles with a width or height smaller than 3. An idea of how to change a collection of wanted split locations into actual, practical, split locations:
def splitrange(n, at):
r = range(1, n+1)
first = r[0]
last = r[-1]
splitpoints = []
actual = None
for wanted in at:
if wanted > n - 2:
continue
if splitpoints != []:
if wanted <= splitpoints[-1]:
continue
diff = None
if splitpoints != []:
diff = wanted - splitpoints[-1]
else:
diff = wanted - first
if diff < 2: #difference too low; 2 is configurable
actual = wanted + (2 - diff)
else:
actual = wanted
if actual > n - 2:
continue
else:
splitpoints.append(actual)
return splitpoints
>>> splitrange(15, [1, 3,3,3,3,4,5,9,12])
[3, 5, 9, 12]
The 1 is changed to 3, because 1 is too close to the border of the tile (which in this function starts at 1 and ends at n. It could be a horizontal or vertical dimension.)
The repeated 3s are ignored because there was already a split location of 3.
The 4 is ignored because it's too close to 3.
5, 9 and 12 are added to the list normally.
After splitting a side, the new sides may still have different iteration counts within them. For example, a side of length 3 won't be split at all:
>>> splitrange(3, [1,2,3])
[]
That makes it difficult to decide what to do when a thread has finished calculating some points of a side. If the points don't have the same iteration count, it's clear what has to be done - the side has to be split, but if the points do have the same iteration count, what to do? Maybe the last calculated batch of points have the same iteration count, but it could still be different from the other iteration counts in the side.
Idea: I store with each side the iterationcount of the first point, to represent the whole side. Struct side will have this member:
nullable<uint> iterationCount_from;
where nullable is defined as
template <typename T>
struct nullable {
T v;
bool isnull = true;
void set(T v) {
this->v = v;
isnull = false;
}
void null() { isnull = true; }
};
The iteration count is unknown if the first point in the side has not been calculated. I set it to null in that case. (Maybe it's better not to use such a nullable struct. I use the value -1 to indicate a nonexistent index in other places, but iteration counts are uints and uint has no value of -1.)
To use this iteration count it must be set as soon as possible. That means the first point of a side must always be chosen first to be calculated, in the work distribution. After a thread working on a side has finished the work, iterationCount_from will always be known, because either it was already known, or it has just been calculated (in which case iterationCount_from should be assigned the just calculated value).
With that:
if the points do have the same iteration count, what to do?
compare it to iterationCount_from. If different, the side must be split.
This works under the assumption that sides are split immediately when different iteration counts are detected in a batch of points. Then either of the following is true:
Unfortunately, there's another case, because sometimes a side can not be split, because it would create a tile that's too small. (I wrote about that here: https://github.com/DinkydauSet/ExploreFractals/issues/34#issuecomment-1073428040 ) When that happens, the following can occur:
This case is a problem. It's impossible to rely on iterationCount_from. An obvious alternative is to check every calculated pixel but that's not efficient.
Due to the way the (Python) function splitrange works, I think it's impossible that it results in sides for which the following holds:
If that is indeed impossible, the problematic case is not a problem. The solution for the case would be to calculate the whole tile without further splitting if the sides of the tile are that small.
The plan is something like this:
for (each side that was partially calculated by this thread)
{
if (side was split in the meantime) {
set the last calculated level/remainder for each of the parts
insert the parts into the priority queue
continue
}
if (other threads are using this side) {
????
}
if (iterationCount_from is null)
set its value
if (points calculated thus far have the same iteration count) {
if (side is finished) {
update number of unfinished sides for the tile
if (tile now has 0 unfinished sides) {
????
}
}
else {
set the number of threads using the side to 0
reinsert the side in the priority queue
}
}
else {
split the tiles of the side between consecutive calculated points with a different iteration count
}
}
Questions:
I'm looking for a way to simplify everything because it's too complicated.
Ways to simplify:
Instead of choosing some random points in a side to calculate, to spread progress, just calculate the points in order, until a point is different from the previous one, then split the tile. This is at least better than the current situation, where the calculation of a side is always finished. It's better to stop as early as possible.
Don't create a batch of work for worker threads. Instead, sides are revered per thread. Worker threads can ask for new points individually, so that per-pixel control is in the work distribution function. This still avoids having to use a mutex per pixel: because sides are reserved, the next point to calculate in a side can be calculated independent from what other threads are doing.
Something like this:
class WorkDistribution {
class worker_state {
vector<side> reserved_sides;
};
vector<worker_state> worker_states;
point getWork(int worker_id)
{
worker_state& state = worker_states[id];
if (work left to do from the reserved sides) {
choose:
choose and return a new point
}
else {
use the mutex
process results of the reserved sides, and unreserve them
reserve new sides
goto choose;
}
}
...
}
Continuing the idea of reserving sides
To find the split locations after a thread has done some work, I generate a list of all the calculated points of a side. This list can be generated easily, but it also needs to be sorted, and the generation happens many times, so that's slow. If the worker thread keeps track of the split locations, it's not necessary to generate a list of calculated points. I'm not sure if that's feasible.
Also continuing the idea of reserving sides, and finding split locations of a tile
Even if a worker thread keeps track of split locations, there's still another reason I need a list of calculated points. I need it to dermine when a tile should be guessed. A tile can be guessed if all its border pointshave the same iteration count, so the problem is how to detect that. Sides are calculated independent of each other. If all side have been fully calculated and all of the points within a side have the same iteration count, that doesn't yet mean that all the sides have the same iteration count. This situation can occur:
The red sides have 1 iteration per point, and the green sides have 2 iterations per point, so within the sides all is ok, but the tile can not be guessed.
I want to choose a representative point for each tile and side. The first calculated point of a tile or side must become the representative point, so that subsequent points can be compared against it. If their iteration counts are different, the tile must be split.
This leads to 2 problems:
Remark: it's impossible to choose a specific location in the tile to be used as a representative point, such as the top left corner. To be a representative point, it has to be the first calculated point (every subsequently calculated point must be compared against it). That means that the top left corner must be the first calculated point of each tile. But the top left corner of one tile may be the top right corner of another tile, violating the requirement for the other tile, so this idea can't work.
After splitting a tile, the representative point will be is one of the two new tiles. For the tile that doesn't contain the representative (it can be in both tiles actually if it's a corner point), a new representative must be chosen. This must be a calculated point, so I need to find in an efficient way:
For that it helps if there's an efficient way to find the calculated points of a side.
Even then, it's not very efficient to do this. A tile may have sides that consist of many small parts. The search for a calculated point would have to check each side in the tile for calculated points.
This is a more efficient way to choose points:
First choose all points that are a multiple of 128, then those that are a multiple of 64 but not of 128, then 32 but not of 64, 16 but not of 32, ...
That sequence has the nice property that it's both:
Finding the multiples of 128 in a side is easy: first find the first multiple of 128, then add 128 until the end of the side is reached. Then, finding the multiples of 64 that are not multiples of 128 is also easy: find the first multiple of 64, then add 128 until the end of the side is reached. Adding 128 instead of 64 skips the multiples of 128. After that the process continues in the same way: find the first multiple of 32, and keep adding 64 until the end is reached.
For example:
This side is only 16 points long, so starting with 128 yields 0 points at first.
The pattern is very similar to the levels idea I had before, except that this idea uses the indices of points. Because of that this method has the desired property that a side can be split anywhere, without messing up the choice of points.
The following python code does what I have in mind and handles all the corner cases. It doesn't start with a fixed multiple like 128. It calculates the maximum possible power of 2 of indices of points in the side, and starts with that.
Output for the case in the image:
[16, 8, 4, 12, 2, 6, 10, 14, 1, 3, 5, 7, 9, 11, 13, 15]
If x is the last calculated points, and the current multiple is 32, then all multiples of 64, and those of 32 up to x, have been calculated. The points can be regenerated in order like this: start generating multiples of 32, and when x is reached, continue generating multiples of 64 until the end of the side is reached. A function to generate those points:
def ordered_upto(state):
if state["exhausted"]:
return range(state["from"], state["to"] + 1)
if state["nextpoint"] == 0:
return []
elif state["from"] == 0:
points = [0]
else:
points = []
multiple = state["multiple"]
start = multiple
if start < state["from"]:
distance_to_target_range = (state["from"] - multiple)
stepsize = multiple
start += ((distance_to_target_range + stepsize - 1) / stepsize) * stepsize
point = start
while point != state["nextpoint"]:
points.append(point)
point += multiple
point += multiple #one multiple was already added before
multiple = multiple * 2
while point <= state["to"]:
points.append(point)
point += multiple
return points
This function constructs a list. In the program I would want a function that only generates the next point at each call, to avoid having to create vectors.
I have abandoned the idea of using Mariani-Silver. All work was for nothing.
The reason I tried to keep using Mariani-Silver is that tiles are simple. It turns out combining the algorithm with render progress is not simple at all, so I might as well go for a more difficult algorithm such as path tracing.
However, I have another idea that may work in a beautiful way. The idea is this:
We can calculate 1 point out of each block of 16×16 points, thereby calculating 1/256 of all points.
This already gives some information. You can view the result as a complete render, except the resolution is 256 times lower than desired. That means we can see where pixels could have been guessed.
The idea is to continue sampling more points in areas where the pixels could not have been guessed.
The first calculated points result in something like this:
From this image it's clear that there's a big red part that can be mostly guessed. What's needed is more detailed information about its border, so what I want to do is do another pass. Every block with a neighbor with a different iteration count must be divided into 4 blocks of 8×8 points. After that, those blocks may need to be devided into 4 blocks of 4×4 points, etc.
The render goes in phases. The first phase is a sampling phase. Some random points are sampled to get a first impression. Then, the render is refined until all borders are completely calculated.
The beautiful aspect of this algorithm is two things:
The second statement is true because all conflicting blocks (blocks with different neighbors) are calculated in more and more detail. That's exactly where the iteration bands touch each other, so the borders of iteration bands are found automatically. The algorithm doesn't keep track of borders - it just happens. If a block remains not fully calculated, that means it's part of a larger part of the render with a border of equal points, which means that it can be guessed.
When I look at the behavior of Fractal eXtreme I suspect it does something like that. It guesses more pixels than my program (visible in the status window), so it must do something smarter than Mariani-Silver. It also has phased render progress like the algorithm above.
Something else that Fractal extreme does is start calculating points in the center. It spirals outwards, but it's not a perfect spiral. I'm not sure what it does. It also appears to use horizontal lines sometimes.
My interpretation/idea is to use a spiral, like this:
The spiral must start in the center.
In general, the size of the FractalCanvas does not have to be a square, so the spiral should work for all rectangles.
Let's say we want to end up in the top left corner. The question is then: where to start spiraling and in which direction? There are several cases:
odd width square starts in the center, upwards
even width square starts in the topright point of the center block of 2×2 points, downwards:
For other square sizes it works the same.
If the rectangle is taller than wide, the starting position must be either lower or higher, depending on the initial direction, which depends on whether the width is odd or even. Other than the shifted starting position, it's the same as with squares:
If the rectangle is wider than tall, the initial direction must also change. If the height is odd, and if it was a square, the initial direction would be up, but in the case where the width is larger than the height, the initial position is shifted and the initial direction is left:
This class implements all the cases:
// The advance function of this class iterates through all points on the canvas in a spiral-shaped order, starting in the center, spiraling outwards.
class spiraler {
public:
enum direction {
left, right, up, down
};
//These values keep track of when the direction should turn to the right.
uint max_up;
uint max_down;
uint max_left;
uint max_right;
point pos;
direction d;
spiraler(uint xmax, uint ymax)
{
//odd height, not taller than wide
//(ymax % 2 == 0 means odd height because point indices start at 0)
if (ymax % 2 == 0 && xmax >= ymax)
{
//first assume square with width and height ymax
pos.x = ymax / 2;
pos.y = ymax / 2;
max_up = pos.y - 1;
max_down = pos.y + 1;
max_left = pos.x - 1;
max_right = pos.x + 1;
d = direction::up;
//if it's not a square after all, make a correction
if (xmax > ymax) {
uint diff = xmax - ymax;
pos.x += diff;
max_right += diff;
max_left += 1; //causes the direction to turn to up where it would start in case xmax == ymax
d = direction::left;
}
}
//even height, not taller than wide
else if (ymax % 2 == 1 && xmax >= ymax)
{
//first assume square with width and height ymax
pos.x = ymax / 2 + 1;
pos.y = ymax / 2;
max_up = pos.y - 1;
max_down = pos.y + 1;
max_left = pos.x - 1;
max_right = pos.x + 1;
d = direction::down;
//if it's not a square after all, make a correction
if (xmax > ymax) {
uint diff = xmax - ymax;
max_right += diff - 1;
d = direction::right;
}
}
//taller than wide, odd width
else if (ymax > xmax && xmax % 2 == 0)
{
//first treat it as a square with width and height xmax
pos.x = xmax / 2;
pos.y = xmax / 2;
max_up = pos.y - 1;
max_down = pos.y + 1;
max_left = pos.x - 1;
max_right = pos.x + 1;
d = direction::up;
//correction (direction remains the same)
uint diff = ymax - xmax;
pos.y += diff;
max_down += diff;
}
//taller than wide, even width
else if (ymax > xmax && xmax % 2 == 1)
{
//first treat it as a square with width and height xmax
pos.x = xmax / 2 + 1;
pos.y = xmax / 2;
max_up = pos.y - 1;
max_down = pos.y + 1;
max_left = pos.x - 1;
max_right = pos.x + 1;
d = direction::down;
//correction
uint diff = ymax - xmax;
max_down += diff;
}
else {
assert(false);
}
//move the starting location one backwards, so that the first call of advance results in pos being the starting location
switch (d) {
break; case direction::up: pos.y += 1;
break; case direction::down: pos.y -= 1;
break; case direction::left: pos.x += 1;
break; case direction::right: pos.x -= 1;
}
}
//The user is responsible not to call this when the end is reached.
void advance() {
assert(pos.x != 0 || pos.y != 0); //check that the end is not reached
switch (d)
{
break; case direction::up: {
pos.y--;
if (pos.y == max_up) {
d = direction::right;
max_up--;
}
}
break; case direction::right: {
pos.x++;
if (pos.x == max_right) {
d = direction::down;
max_right++;
}
}
break; case direction::down: {
pos.y++;
if (pos.y == max_down) {
d = direction::left;
max_down++;
}
}
break; case direction::left: {
pos.x--;
if (pos.x == max_left) {
d = direction::up;
max_left--;
}
}
}
}
};
The easiest way to do phased rendering is: finish a phase completely, THEN start the next phase. But I want to keep threads busy, so I prefer if idle threads already start with the next phase. The spiral order makes that possible. For a next phase, blocks of the previous phase need to be compared to their neighbors, so all the neighbors need to be done. Usually that will be the case. By the time a thread becomes idle, the first few blocks in the center will probably be done, so the next phase can already start there. It's not guaranteed, so there need to be checks, and threads need to wait if needed.
I think the waiting can be implemented with conditon_variable: https://en.cppreference.com/w/cpp/thread/condition_variable
The other problem is to determine whether all neighbors of a block in the spiral are done. Solution: keep track of the lowest index in the spiral that's not yet calculated. Every time a thread is done with its work, it can report to some coordinating entity that it's done, so that the lowest not calculated index is updated. Then, all neighbors of a block are done if and only if the maximum of the indices of the neighbors is lower than the lowest not calculated index.
This raises the question: how to find the largest index of a neighbor? It depends on the location of the block and the current direction. The farthest away neighbor is "1 rotation away", like this:
The red block is a block of which I want to know which neighbor is the farthest away in the spiral. After one extra rotation (the green part) the green line ends up in the neighbor above. Remark: I don't care about diagonal neighbors now. I think/hope it's not necessary to consider those for correctness.
This Python function finds the number of extra steps to complete one extra rotation in every possible case:
def farthest_spiral_point(x, y, direction, max_down, max_up, max_left, max_right):
a = 0
if direction == up:
print("up", y - max_up)
a += y - max_up
print("right", max_right - max_left - 1)
a += max_right - max_left - 1
print("down", max_down - max_up)
a += max_down - max_up
print("left", max_right - max_left)
a += max_right - max_left
print("up again", max_down - y)
a += max_down - y
return a
if direction == down:
print("down", max_down - y)
a += max_down - y
print("left", max_right - max_left - 1)
a += max_right - max_left - 1
print("up", max_down - max_up)
a += max_down - max_up
print("right", max_right - max_left)
a += max_right - max_left
print("down again", y - max_up)
a += y - max_up
return a
if direction == left:
print("left", max_right - x)
a += max_right - x
print("up", max_down - max_up - 1)
a += max_down - max_up - 1
print("right", max_right - max_left)
a += max_right - max_left
print("down", max_down - max_up)
a += max_down - max_up
print("left again", x - max_left)
a += x - max_left
return a
if direction == right:
print("right", x - max_left)
a += max_right - x
print("up", max_down - max_up - 1)
a += max_down - max_up - 1
print("right", max_right - max_left)
a += max_right - max_left
print("down", max_down - max_up)
a += max_down - max_up
print("right again", max_right - x)
a += x - max_left
return a
What is apparent from this function is that the calculation of both up and down is exactly the same, and also left and right are exactly the same. The arguments max_down, max_up, max_left, max_right are like in the spiraler c++ class.
Actually all cases in the farthest_spiral_point function are equivalent and independent of x and y. It's just this:
int farthest_neighbor_steps()
{
return 2*(max_right - max_left + max_down - max_up) - 1;
}
Solution: keep track of the lowest index in the spiral that's not yet calculated.
This was easier said than done.
I currently plan to start with a block size of 16×16 and call the corresponding phase phase 16, so there are phases 16, 8, 4, 2 and 1. The first phase is different from the rest.
The first phase calculates 1 point for each block of 16×16 points. The spiraler class generates locations of those 16×16 blocks.
The second phase calculates or guesses 1 point for each block of 8×8 points. Whether a block of 8×8 can be guessed depends on the 16×16 blocks around it. When a 16×16 block has equal neighbors, the 8×8 blocks within it can be guessed. This works, as I have implemented it and it gives the expected results in various real renders. Therefore even in phase 8 I still need the spiral of 16×16 blocks, and for each block either guess the 8×8 subblocks or calculate them.
After the second phase, the same thing happens again: phase n needs the spiral of the previous phase 2n.
phase | iterates over blocks of size |
---|---|
16 | 16×16 |
8 | 16×16 |
4 | 8×8 |
2 | 4×4 |
1 | 2×2 |
The problem is: phase 8 doesn't produce any information about an 8×8 spiral, which the next phase needs. Phase 8 only works because phase 16 is different from the rest in that the block size is the same as the block size it iterates over. It leaves the 16×16 iteration information that phase 8 needs, but after phase 8 my idea doesn't work.
The 8×8 spirals and 16×16 spirals are related. There may be a formula to convert a location in the 16×16 spiral to the 8×8 spiral, like the formula for farthest_neighbor_steps but I'm tired of finding those formulas. Calculating the next step in a spiral is very fast so I can instead create a 8×8 spiral that follows the progress of the 16×16 spiral.
Consider these two examples (ignore the diagonal green line):
The tiles are 16×16. The grey spirals iterate over the 16×16 tiles directly. The green spirals iterate over the 8×8 tiles.
The 8×8 iteration roughly follows the 16×16 iteration, except that sometimes it goes in a different direction, completes a full rotation and then goes back in sync with the 16×16 iteration. Therefore I have this idea that will hopefully work:
3.3 is equivalent to moving to the farthest neighbor, so it can be accomplished by advancing 8×8 a certain number of times as given by the formula I derived earlier.
This test succeeds:
The synchronizing spirals idea is too difficult. New idea: I don't need to know exactly how far the next phase can progress without overtaking the current one, I just need to know a lower bound. If I make sure the lower bound is never exceeded, it will work.
A lower bound can be derived from the size of the rectangle that has been visited thus far. Consider this situation where the 16×16 phase (grey) is partially done. The 8×8 phase (green) can progress safely until where it's drawn.
The grey line has just taken a turn but has not finished going down. By ignoring the part from the last turn and further, the grey squares form a rectangle of which the boundary should be easy to calculate. The 8×8 phase can safely continue as long as it stays between the boundary of the rectangle.
The sampling algorithm is now implemented. There is a lot of work still to do because it doesn't work with oversampling and the resolution must be divisible by 16 (just to name 2 things that are not finished yet).
For complicated renders it's faster because it guesses more pixels. For simple renders it takes longer. For example, unzoomed Mandelbrot takes 2 times longer. The most extreme difference is with the procedure Pure Julia morphings. ExploreFractals 11 takes 0.02 seconds on my computer. The new algorithm takes 0.11 seconds.
I want to improve the performance as much as possible. Possible reasons why it's slow:
I don't know how much these things matter.
Now that the algorithm works it should be adjusted to achieve extra features that I want:
(2) is a new feature which has its own issue ( https://github.com/DinkydauSet/ExploreFractals/issues/39 ) but it's not completely unrelated. Re-using calculated points works well with the sampling algorithm because it can prevent 16×16 blocks from showing up at all. When a render has finished and the user zooms in, the magnification factor is increased by 4 (I consider changing that to 2), so for every block of 4×4 pixels in the new render there is a point inside that block that has already been calculated. That means the render can skip the 16×16 and 8×8 blocks somehow. The sampling algorithm implementation should be suitable for that idea.
The problem with oversampling is that blocks of points do not correspond nicely to pixels. Consider this situation where the black boxes are 8×8 blocks (of the sampling algorithm) and the red boxes are pixels. The pixels contain 5×5=25 points so this is 5×5 oversampling. The grey pixels are calculated points.
Some pixels overlap with multiple 8×8 blocks. It's not clear when a pixel should be colored. Ideally I want it to be colored immediately after information about its color becomes available. That is, when a point inside the pixel is calculated, it can be colored so that the user doesn't have the wait for all the points in the pixel to be done. That's the whole benefit of the sampling algorithm. In this situation there are some pixels that contain 4 calculated points, others with 2 and some have only 1. Ideally again, the pixels that contain multiple calculated points should have the average color of those 4 points, but that requires a lot of bookkeeping. It's also difficult because of multithreading. Because pixels can overlap with 16×16 blocks of points, multiple threads may be working on the same pixel. When both threads try to set the color it can go wrong.
A solution to the problem above that I have in mind is this: worker threads (that do the Mandelbrot iterations (or other procedures)) should not color any pixels. The refreshthread can do that based on the contiguous progress of the phases. Let's look at an example situation:
In this situation, the spiral in the center shows the largest contigous part of a phase that is done. By contiguous I mean that there is possibly work done after where this spiral ends, but there is a gap. For example, the real situation could be like this:
The reason I'm interested in the contiguous part is that the part can be safely colored. This only applies to a specific phase. From the perspective of a specific phase, pixels that are completely inside of the contiguous spiral are done. It may be possible for multiple threads to work on the same pixel, but inside of the contiguous spiral all those threads are guaranteed to be done.
The refresh thread can keep track of changes in the contiguous spirals of the phases. Every time it is time to update the image on screen (currently 10 times per second) the contiguous spiral may have become larger, for example:
The green part means that that part of the spiral was calculated since the last refresh of the image on screen.
Because calculating colors is done by the function renderBitmapRect which only works on rectangles, the newly added part of the spiral should be divided into rectangular pieces, like this:
Note that piece number 2 contains some work that was done already at the previous refresh. I intend to color a piece of the spiral only when the whole direction is done. In this case that's up. Where the black part of the spiral was left its direction is up. Otherwise the coloring work can't be divided nicely into 4 rectangles.
A downside of this approach is that only contiguous progress is shown. This can be a problem if the center of the render has some pixels that take a very long time to calculate.
How exactly calculated points will be mapped to points in the new render is the subject of the other issue: https://github.com/DinkydauSet/ExploreFractals/issues/39 Regardless of how, the situation for the sampling algorithm is that some points may be calculated before the render even starts. To keep it general I don't want to assume anything about there the calculated points are. There are just some calculated points. Some 16×16 blocks may contain no calculated points, others 1 or more, and they can be anywhere in the 16×16 blocks. For example, a render could start like this, where the grey points are calculated points:
Here I use 8×8 blocks while a render would normally start with 16×16 blocks but that's just to keep the image smaller. The same idea holds for 16×16 blocks.
Currently the sampling algorithm calculates the topleft corner of a block. In the topleft block in the image, the topleft corner is already calculated, so that can be skipped easily. However, the center 8×8 block contains a calculated point somewhere else. I want to use that point as a representative for the block and skip that block too, until it's split (if needed) in the next phase.
It's important to keep track of where the representatives are because a worker thread should be able to check whether a block can be skipped. Also, in the next phase, the center block (if needed) is split into 4 new blocks of which 3 contain no calculated points and one contains the representative point. That block should again be skipped because it contains a representative point.
The solution I have in mind for this is to keep some vectors of bools where, for each block, a bool is stored which indicates whether the block has a representative point. There are 3 steps to do when a render has to start:
When I came up with the solution (which may change again as always) I had other features that I want in mind. The solution works well together with the zoom animation idea ( https://github.com/DinkydauSet/ExploreFractals/issues/33 ) because creating the vectors of bools and the initial guesses can be done during the animation. I expect that work to be so fast that by the time the animation is done, the initial colored bitmap is ready to show immediately. The solution also works well together with my threadpool idea ( https://github.com/DinkydauSet/ExploreFractals/issues/43 ) because the tasks of (for example) doing the zoom animation and creating the initial coloring can be scheduled on the threadpool. It's not a problem if the render already starts while the zoom animation is still happening, so one thread can be reserved for that until it's done, and then it can become a worker thread for the render. And the solution also works well together with the re-using calculated points idea. The generality of the sampling algorithm implementation (re-used points can be anywhere without restrictions) allows for different kinds of re-use like after applying a julia morphing or a rotation.
I want something like what Fractal eXtreme does. Fractal eXtreme first renders a very low resolution version of the fractal so that the user has a rough idea of what it looks like. Then it refines the render. It looks like it renders in multiple passes: first it calculates 1 out of every block of, say, 256 pixels so you get something very pixelated. Then it calculates 3 more pixels per block, making the image less pixelated, then more again... until all pixels are calculated.
For example, first you see something like this:
A couple of passes later you see this:
A challenge is how to combine this with the Mariani-Silver algorithm that I use: https://mrob.com/pub/muency/marianisilveralgorithm.html
The problem is that this algorithm may spend a lot of time on thin lines in one part of the screen before even starting to do work on another part of the screen. I want the progress to be more evenly distributed over the screen.
Unfortunately Fractal eXtreme is not open source, otherwise I would try to figure out how it does this, because the way Fractal eXtreme does it really works well. It's not 100% as efficient as Mariani-Silver but that doesn't matter because the user experience is a lot better.