JamesBremner / so77307162

Group graph vertices according to attributes.
1 stars 0 forks source link

Guided BFS #4

Closed JamesBremner closed 1 year ago

JamesBremner commented 1 year ago

I have further enhanced the breadth first search.

Previously, the search blindly explored adjacent localities. If it happened across an acceptable group then the group was added to the output.

The search now only looks at adjacent localities with bring the group sum closer to zero.

This produces better results in the sense that more localities are assigned to groups.

For example:

============
2 1.000000
13 0.000000
15 2.000000
28 -1.000000
38 -2.000000
28 -1.000000
============
3 1.000000
4 2.000000
17 -1.000000
21 -3.000000
25 1.000000
26 0.000000
17 -1.000000
============
5 1.000000
6 -3.000000
30 2.000000
30 2.000000
============
7 -3.000000
35 1.000000
36 2.000000
36 2.000000
============
11 2.000000
23 0.000000
24 1.000000
33 -3.000000
24 1.000000

image

JamesBremner commented 1 year ago

released in v0.0.1

remidelattre commented 1 year ago

The search now only looks at adjacent localities with bring the group sum closer to zero.

In trying to maximize the number of localities integrated in groups, wouldn't it be better to first try to maximize "de" within the adjacency parameters and then look at adjacent localities which bring the group sum closer to zero, starting with the localities with the lowest deficits ? This is the method I try to illustrate in the video, albeit with some mistakes in assessing the optimal path.

In my opinion, the distribution of "de" values is such that we have a few localities with very high deficits and some with very low deficits. Conversely, we have a few localities with very high surpluses and many with low surpluses. If you immediately attempt to make de = 0, you are going to have smaller groups, and it's likely that at the end, less localities will be in groups.

This is an important point in the fuction because so far we've compared our algorithm versions based on how many localities remain ungrouped at the end. Minimizing the amount of ungrouped localities within a given sample is our metric for performance assessment.

remidelattre commented 1 year ago

image

In this example, we avoid F29 because integrating it to the group would considerably reduce group size.

remidelattre commented 1 year ago

image this is the groups we can excpect using max adjacency = 4 and de = 0 with the method described above. Hand-made for now :) Obviosuly I'd need to keep making groups with increasingly worse starting locations, but this is an encouraging result in terms of group size.

JamesBremner commented 1 year ago

starting with the localities with the lowest deficits

I assume you mean the localities with the lowest absolute deficits

Currently, the localities are selected for the BFS search in whatever order they are found in the input.

It seems likely to me that starting with the localities that have the smallest absolute deficit might well give you smaller groups, because they have the least 'distance' to go to be in the acceptable range ( -0.5 to 0.5 )

However, this option would be easy enough to implement, so you could try it. If you would like me to do this, please open a new issue and assign it to me.

JamesBremner commented 1 year ago

the amount of ungrouped localities within a given sample is our metric for performance assessment.

FYI I have added to the display a count of assigned and total localities

image

remidelattre commented 1 year ago

This is fairly impressive. May I ask how long it took you to compute the result ? Could we add a condition where minimum group size has to be 5 ? I should probably stop asking and tweak it myself at this point lol. I will run tests tommorow for sure. I cant wait to compare your groups to my handmade ones.

JamesBremner commented 1 year ago

May I ask how long it took you to compute the result ?

A few seconds for the grouping calculation. Including reading the input file and outputting the results ( which take longer than the grouping calculation ) 35 seconds.

remidelattre commented 1 year ago

Not sure if this is the best issue to discuss the visualization aspect, but do you suggest visualizing one group at a time ? Cant imagine a way to visualize 22000 localities contained in thousands of different groups.

JamesBremner commented 1 year ago

Please look at commit https://github.com/JamesBremner/so77307162/commit/4cb8514ce43ff20384f56b325f2c36a154701f03

JamesBremner commented 1 year ago

Are you ready to close this issue?

remidelattre commented 1 year ago

Just to be sure : with this version, the program no longer generates graphs but instead shows a list regardless of the size of the dataset, right ?

JamesBremner commented 1 year ago

The layout calculation takes a very long time when there are hundreds of localities. I doubt that the layout display is useful in such cases anyway, so if there are too many localities in the selected region(s) the layout calculations is skipped. See commit https://github.com/JamesBremner/so77307162/commit/81cfac768fc791768e5697d98858153bc493d213

remidelattre commented 1 year ago

Understood, closing this :)