Closed JamesBremner closed 1 year ago
released in v0.0.1
The search now only looks at adjacent localities with bring the group sum closer to zero.
In trying to maximize the number of localities integrated in groups, wouldn't it be better to first try to maximize "de" within the adjacency parameters and then look at adjacent localities which bring the group sum closer to zero, starting with the localities with the lowest deficits ? This is the method I try to illustrate in the video, albeit with some mistakes in assessing the optimal path.
In my opinion, the distribution of "de" values is such that we have a few localities with very high deficits and some with very low deficits. Conversely, we have a few localities with very high surpluses and many with low surpluses. If you immediately attempt to make de = 0, you are going to have smaller groups, and it's likely that at the end, less localities will be in groups.
This is an important point in the fuction because so far we've compared our algorithm versions based on how many localities remain ungrouped at the end. Minimizing the amount of ungrouped localities within a given sample is our metric for performance assessment.
In this example, we avoid F29 because integrating it to the group would considerably reduce group size.
this is the groups we can excpect using max adjacency = 4 and de = 0 with the method described above. Hand-made for now :) Obviosuly I'd need to keep making groups with increasingly worse starting locations, but this is an encouraging result in terms of group size.
starting with the localities with the lowest deficits
I assume you mean the localities with the lowest absolute deficits
Currently, the localities are selected for the BFS search in whatever order they are found in the input.
It seems likely to me that starting with the localities that have the smallest absolute deficit might well give you smaller groups, because they have the least 'distance' to go to be in the acceptable range ( -0.5 to 0.5 )
However, this option would be easy enough to implement, so you could try it. If you would like me to do this, please open a new issue and assign it to me.
the amount of ungrouped localities within a given sample is our metric for performance assessment.
FYI I have added to the display a count of assigned and total localities
This is fairly impressive. May I ask how long it took you to compute the result ? Could we add a condition where minimum group size has to be 5 ? I should probably stop asking and tweak it myself at this point lol. I will run tests tommorow for sure. I cant wait to compare your groups to my handmade ones.
May I ask how long it took you to compute the result ?
A few seconds for the grouping calculation. Including reading the input file and outputting the results ( which take longer than the grouping calculation ) 35 seconds.
Not sure if this is the best issue to discuss the visualization aspect, but do you suggest visualizing one group at a time ? Cant imagine a way to visualize 22000 localities contained in thousands of different groups.
Are you ready to close this issue?
Just to be sure : with this version, the program no longer generates graphs but instead shows a list regardless of the size of the dataset, right ?
The layout calculation takes a very long time when there are hundreds of localities. I doubt that the layout display is useful in such cases anyway, so if there are too many localities in the selected region(s) the layout calculations is skipped. See commit https://github.com/JamesBremner/so77307162/commit/81cfac768fc791768e5697d98858153bc493d213
Understood, closing this :)
I have further enhanced the breadth first search.
Previously, the search blindly explored adjacent localities. If it happened across an acceptable group then the group was added to the output.
The search now only looks at adjacent localities with bring the group sum closer to zero.
This produces better results in the sense that more localities are assigned to groups.
For example: