add OpenNARS-style Distributor in Bag

maxeeem commented 8 months ago

See #83

I added the distributor but looking for your suggestions to see if we can improve the implementation. The issue is that initializing the Distributor object is not exactly quick, there are multiple loops in the __init__ method. To avoid recalculation I added a cache in the form of a static method new but as I am myself new to python, perhaps you guys can think of better ways to optimize the calculations. With this change tests run slower but still less than 1 min.

ccrock4t commented 8 months ago

Hi Maxim, I will need more info about what is the Distributor

maxeeem commented 8 months ago

@ccrock4t you can see the code from OpenNARS.

As the comments suggest (I will add them to this PR shortly)

A pseudo-random number generator, used in Bag.

And

For any number N < range, there is N+1 copies of it in the array, distributed as evenly as possible

This is all following our discussion over email, regarding selection probability in bag being propositional to priority.

Without this change, the result of test_bag_take_task() looks like this (as @bowen-xu pointed out)

Figure_1

After the change it looks like this which I believe is more along the lines of what we want

Figure_2

ccrock4t commented 8 months ago

In terms of speed, it seems like that is the cost of using this algorithm. If we can find out the name of the algorithm, maybe there are discussions online about improving it?

ccrock4t commented 8 months ago

It seems a Distributor with value $N$ takes $O(N^2)$ time for initialization, so that is why the slowdown is occurring.

My initial thoughts: the self.order array is being filled one value at a time (in a loop) using local variables rank, time, and index. Can we simply save the state of those variables as members of the object, then calculate the value for self.order on the fly, whenever pick() is called, using the same logic in __init__?

EDIT: it will probably not be that simple

bowen-xu commented 8 months ago

It seems a Distributor with value N takes O(N2) time for initialization, so that is why the slowdown is occurring.

My initial thoughts: the self.order array is being filled one value at a time (in a loop) using local variables rank, time, and index. Can we simply save the state of those variables as members of the object, then calculate the value for self.order on the fly, whenever pick() is called, using the same logic in __init__?

EDIT: it will probably not be that simple

@ccrock4t Is the slowdown significant? I found that the initialization takes only around 8ms in my laptop.

ccrock4t commented 8 months ago

@ccrock4t Is the slowdown significant? I found that the initialization takes only around 8ms in my laptop.

Not for the current default number of buckets (100), but the time for initialization could become an issue if we need to significantly increase the granularity of Bag (e.g., to 10,000 or more).

Since it is only a 1-time operation, the slowdown would not really impact user experience during NARS operation. Only on startup. However the memory also takes up $O(n^2)$.

ccrock4t commented 8 months ago

@bowen-xu

bowen-xu commented 8 months ago

@ccrock4t I agree that "the time for initialization could become an issue if we need to significantly increase the granularity of Bag (e.g., to 10,000 or more)." However, do we really need 10,000 levels? I think 100 levels is enough, as we don't need such high accuracy for priority. The control mechanism is not accurate.

ccrock4t commented 8 months ago

do we really need 10,000 levels? I think 100 levels is enough

I agree it is OK for now, since we only tend to use 100 buckets.

But, if we want to try more a lot more buckets in the future, we should revisit this.

bowen-xu / PyNARS

add OpenNARS-style Distributor in Bag #84