bowen-xu / PyNARS

MIT License
26 stars 8 forks source link

[Bug] Variable creation is slow #99

Open maxeeem opened 5 months ago

maxeeem commented 5 months ago

Describe the bug

While debugging NAL-6 tests, I found that one of the reasons for increased runtime when variables are present lies in variable initialization. It takes almost 3x longer to create a Variable than to create a Term.

To Reproduce

Steps to reproduce the behavior:

import timeit

def test_variables(self):
    print(timeit.timeit(lambda: Term('a')))
    print(timeit.timeit(lambda: Variable(VarPrefix.Independent, 'x')))    

> 1.020983292
> 2.712368792

Expected behavior

As KanrenReasoner performs frequent conversions back and forth between Narsese and Logic, this leads to suboptimal performance on tasks involving variables. We need to find a way to handle variables more efficiently.

bowen-xu commented 5 months ago

This is because the complex data-structure I adopted, which might need changes. The index of each variable is recorded in self._vars_independent. https://github.com/bowen-xu/PyNARS/blob/52ed0fac4de3ced442685ac435c86f38cd0ebc05/pynars/Narsese/_py/Variable.py#L32-L34 https://github.com/bowen-xu/PyNARS/blob/52ed0fac4de3ced442685ac435c86f38cd0ebc05/pynars/Narsese/_py/Statement.py#L37-L39 https://github.com/bowen-xu/PyNARS/blob/52ed0fac4de3ced442685ac435c86f38cd0ebc05/pynars/Narsese/_py/Compound.py#L55-L55

Through those lines/functions, the ids and indices of variables are computed. For example,

image

The id of each variable is repersented as a number. In the example above, <(&&, <#x-->A>, <#x-->B>, <$y-->C>)==><$y-->D>> contains four variables (#x, #x, $y, $y) (though only two distinct variables), and they are internaly stored in a tuple (0, 0, 1, 1). The indices of the variables are [0, 0, 0], [0, 1, 0], [0, 2, 0], and [1, 0]. There are some routines to maintain this kind of structure and records.

Another example:

image

The reason why I use this structure is that when doing unification, for example, the two variable terms are (&&, <$x --> A>, <$y --> B>) and (&&, <$z --> A>, <$w --> B>), their internal representations are equal (i.e., (&&, <$ --> A>, <$ --> B>) (0, 1) [[0, 0], [1, 0]] (here (0, 1) are the ids of the two variables, and [[0, 0], [1, 0]] are their indices/postions).

But when using Kanran, there seems no need to maintain such a data-structure.