Blimpyway / sdr-machine

Exploring SDR Ideas
5 stars 2 forks source link

Curious about your experiences with Triadic Memory #1

Closed jbmaxwell closed 8 months ago

jbmaxwell commented 8 months ago

Hello!

I found your HTM forum post from a link in Peter Overmann's TriadicMemory repo, which led me here. I was searching around for associative memory models when I found his work. Coincidentally, I actually delved into HTM at the start of my PhD way back in 2009, but had to abandon it when the initial implementation/approach failed to pan out (I quickly switched to a novel, n-gram-based cognitive approach).

I'm just curious to know how far you went with this and whether you applied it anywhere in particular. As I say, my immediate interest is in an associative memory that can be used to recall similar (auto-associative) items and contrasting, but related, (hetero-associative) items. I would be working with sequences of 32-dimensional (-1, 1) float vectors as inputs, where the sequence length would typically be in the range of 1024 or 2048, but could extend indefinitely. My current plan would be to use a sparsifying autoencoder, but it seems like you've created something similar to a "spatial pooler" from HTM, which would serve a similar function, correct?

Anyway, curious to hear your thoughts (and kind of excited to have HTM make an entrance again after all this time!).

Blimpyway commented 8 months ago

Hi, I'm not sure I follow what your end goal is. Like

Understanding what your 32 size vectors represent would be helpful too.

Yes a (more or less) random projection in a higher dimension + thresholding (to get SDRs out) might be easier and good enough instead of a sparsifying autoencoder, It depends on your data set. e.g. you have too few data points to fully train an autoencoder.

jbmaxwell commented 8 months ago

Thanks for the reply.

Yes, my raw data is compressed audio, as a time-series in 32-dimensional columns. So it can be any width (length), but generally it will be > 512. And yes, I would be sparsifying into much more than 32 dimensions. I am basically doing prediction (but maybe not the normal sense). The real "hitch" is that individual 32-dimensional vectors don't represent meaningful information, so I will have to store some minimum number of columns... which I suppose just means that my sparsified inputs will be very large. Do you have a rough sense of how Triadic Memory works with large vectors? (For my current purposes I could probably further compress the data, but I'd rather avoid that if possible.)

But my main interest right now is just in finding a strong associative memory; partly in the normal sense of being content-addressable, but also more "cognitively", in the sense that both closely (auto-associative) and more distantly related (hetero-associative) memories can be recalled.

Blimpyway commented 8 months ago

There-s quite a things here. First thing raising question marks is the compressed word there. Usually compression of any type of data tends to obfuscate relevant structure from ML algorithms. Which might or not be the case without further details.

Then the associative memory - this tends to be overhyped concept but basically any vector indexing library of vector database can be used to implement an associative memory. A second question you want to clarify about associative memory is whether you want it implemented at real vector level (without conversion to SDRs) or as an SDR memory which means first convert the 32 scalar vectors to e.g. 500 or 1000 bit SDRs then use these representations for the associative memory storage.

So one question you want answered to begin with is whether you want to use "associative" storage & query at dense vector level or SDR-fy them before.

  1. Dense vector level - you have lots of mature 1.1. ANN (Approximate nearest neighbor) libraries or 1.2. a full vector database to.

  2. SDR format here are two experimental options I'm aware of. Both perform in memory (you need to perform your own save/load from permanent storage. 2.1. Diadic memory (a simpler variant of Triadic memory) which maps a vector pairs (X to Y). That's equivalent to heterogeneous memory you mentioned. It can be used as auto-associative memory if you simply map (X to X). An implementation is DiadicMemory class in sdrsdm.py file in this repository. Implementations in Peter's Triadic memory repository might be newer, but they are based on the same principle. 2.2 sdr_id_mem.py (SDR_MEM class) in this repository - it is more like an ANN cache. I will detail further down how it differs from both "usual" ANN indexing and DiadicMemory. These differences could be important depending on your use case.

Details: 1. ANN libraries are useful to build a search near neighbor search index out of a given data set in fixed size dense vector format. Vector databases are usually build on top of ANN indexing libraries. The main difference between the two is the databases are usually slower but provide an online add/remove/update interface as "normal" databases do, while ANN libraries work better in "batch" mode when you have a given, relatively large "base" dataset and you want to build an index out of it. ANNs libraries usually allow for adding more vectors to the index, but removing vectors usually invlolves rebuilding an index with only the vectors you want to keep. Vector databases hide these intricate updating details and hide the issue of permanence (storage on disk) at a speed penalty. ANNs work usually in memory and need explicit load/save operations on whole index for permanence.

I don't have experience with vector databases (you have to do your research). Regardin ANN libraries I found pynndescent as being a very decent (performance + flexibility), well documented near neighborhood library for both real and bit-type vectors (SDRs). It's pluses are maturity, speed, and ability to provide your own metric distance function which makes it easy to use inverse of SDR overlap as distance metric between two SDRs.

What you need to know about ANN search is that it is a fast replacement/approximation of KNN. So your query vector will retrieve a K number of original vector id-s "closest" to your query vector. You can then e.g. average these to mimic associative memory behavior.

  1. DiadicMemory (sdrsdm.py) vs SDR_MAP ( sdr_id_map.py) you need to consider that:
    • both have fixed memory size/capacity.
    • they have decent online performance (writes are actually pretty fast) and there is no "index rebuild" needed after every write in order to query/search the memory. Performance is pretty "flat" regardless on how many write operations were performed.
    • DiadicMemory acts more like a "true" associative memory in the sense both query and responses are SDRs, while SDR_MAP acts like an ANN for SDRs - it retrieves several (zero or more) IDs of matching SDRs previously stored.
    • DiadicMemory tends to need a lot more memory (sdr size squared) and its capacity is limited to the SDR/LEN squared number of entries. E.G. 1000bit long SDRs with 10 ON bits (10/1000 bits) will hold at most (1000/10) ** 2 ~ 10k SDRs. So to increase capacity you need larger, sparser SDRs and things can quickly run out of limits.
    • DiadicMemory retrieves overlaps as unions. e.g. if you store (A -> B) and query with "A" it will retrieve "B" , if you later store (A->C) and then query for "A", it will retrieve an UNION of B and C (aka overlap). This can be an advantage or disadvantage, the problem is that if you store many similar vectors as "A" the retrieved ones tend to be an overlapping mess.
    • While SDR_MEM tends retrieves a list of matching individual IDs (you can restore a list of matching C,B ... "values" from the response). So it doesn't mixes the results.
    • Both have limited capacity, what is important to know is SDR_MAP safely writes over old values when it gets "crowded" so it can be used online indefinitely (much like a cache store or a medium-size memory) while DiadicMemory simply breaks apart when its capacity limit is exceeded. This might be fixed in newer implementations but I doubt it is fixed in python.

To sumarize:

Blimpyway commented 8 months ago

Correction above - diadic memory needs 1/2 sdr-size cubed memory. which at 1000bit long SDRs would be 1/2 GBytes with ~10k vectors capacity for 10bits ON. This could be quite restrictive. SDR_MEM needs a memory of ~ sdr_size squared multiplied by a fixed "slot size" . Increasing slot size increase capacity at a performance penalty. Having denser SDRs (e.g. 50/1000) affects it too but at a lesser extent. New writings stochastically erase older values, while DiadicMemory agglomerates new writings over older values.

Blimpyway commented 8 months ago

If you want me to useful beyond long theoretical explanations about what can or can not do these libraries I will need bit more details about your application. Like what (how) your 32 dimensional vectors encode, and how many of them you need to store in the associative memory.
For large (up to 1M) immutable indexed data pynndescent could be a safer bet than experimental SDR libraries I mentioned. But for smaller frequently updated memory cache (past ten thousands or even close to 100k entries) then SDR_MEM might work.

AFAIK DiadicMemory works decently with fewer, very sparse SDRs (1-2% sparsity or lower).

jbmaxwell commented 8 months ago

Thanks for your detailed response.

The representations are latents from a pretrained VAE or neural audio codec. As far as associative memory goes, I was thinking more of the cognitive notion than a computational one, though obviously they're very similar. My main interest was in the activation process—again, in the cognitive sense—where "populations" of memories are activated (i.e., your "overlapping mess"), the most highly activated of which become likely targets for selection in a given context.

Thanks for the info on ANN search; it might provide an interesting/useful piece of the puzzle. Thanks also for the SDR capacity details. Super helpful.

Blimpyway commented 8 months ago

Ok embeddings for a VAE should work fine, at least it works with other types of data.

I'm not sure I understand your concept of "cognitive notion". All I can say you can compute an average distance between the K retrieved nearest vectors which will give you a relative measure of "density" or "agglomeration" around your query vector.

Here is documentation/example usage of pynndescent https://pynndescent.readthedocs.io/en/latest/how_to_use_pynndescent.html - 32 size vectors should give pretty decent performance >1000 vectors/second index build and 10x times search speed.

jbmaxwell commented 8 months ago

Great, thanks.

By "cognitive notion" I just mean the use of the term from psychology/cognition (where I first became aware of it), not computer science, where it might be closer to content-addressable memory or associative arrays (or associative storage).

Blimpyway commented 8 months ago

If you want I can give it a go at SDR conversion & indexing with my library here for up to ~100k data points. Besides mnist images I haven't had a chance to test it with any real data set. By "data point" I mean a 32 floats vector.

Regards, Cezar

jbmaxwell commented 8 months ago

Thanks, but unless you feel curious yourself I'm not too worried about it for the time being. PyNNDescent has tipped off some other thoughts about how I might approach what I have in mind.

Blimpyway commented 8 months ago

That's fine, as you wish. Have fun!