boris-kz / CogAlg

This project is a Computer Vision implementation of general hierarchical pattern discovery principles introduced in README
http://www.cognitivealgorithm.info
MIT License
91 stars 41 forks source link

Buffering the intermediate patterns in a RAM disk, Memory usage and if __name__=='__main__' #20

Closed Twenkid closed 4 years ago

Twenkid commented 5 years ago

Hi Khan,

Due to the usage of the modules as imported ones, do you plan including the "main" type of check:

if __name__ == '__main__':
    ...

and a respective function that can be called explicitly not just with argv... but with any input parameter?

I guess the code is supposed to run in the right order as is now (when the module frame_blobs is imported in intra_blob, the image is loaded, then the patterns are generated; then it goes on the next step and computes the intra_blob's stuff).

However if the frame_blob is somewhat debugged already for now, for running the code and debugging the intra_blob, that processing step could be skipped by storing the Y, X and frame_of_blobs with pickle, then checking if the file exists, and if it does - just reloading them in intra_blob_debug.py, at best - using a RAM disk.

I haven't run the code recently and don't know the memory footprint, though. Have you made such estimations? When I was working on a much earlier and simple version, I was concerned that we were about to run out of memory on a "normal" PC.

How is it now?

Kind regards

https://github.com/khanh93vn/CogAlg/blob/f87b92f2b67826dfea3878826a82860760b6176e/frame_2D_alg/intra_blob_debug.py

https://github.com/boris-kz/CogAlg/blob/7120244ff554ae92e77346d6203a65036b2da85b/frame_2D_alg/frame_blobs_plain.py#L243

Twenkid commented 5 years ago

Also, do you manage to run it and how do you do it?

I tried to run both intra_blob_debug, from 2D_alg and 2D_alg_classes folders, either from PyCharm and from cmd line, but there are issues with the imports which I couldn't resolve yet.

"No module named 'frame_2D_alg_classes' " etc.

When adjusting the imports to include the folder (package) name, and adding some missing imports (generic_functions once), PyCharm sometimes stops complaining. However Python fails when attempting to run it - modules not found.

boris-kz commented 5 years ago

Not sure if he gets these messages Todor, I will tell him. Anyway, the classes version is out date, we are switching back to classless, with named tuples. But it's a work-in-progress, we are changing it all the time. Speed and memory is the least of our concerns now.

On Tue, Mar 5, 2019 at 4:40 PM Todor Arnaudov notifications@github.com wrote:

Also, do you manage to run it and how do you do it?

I tried to run both intra_blob_debug, from 2D_alg and 2D_alg_classes folders, either from PyCharm and from cmd line, but there are issues with the imports which I couldn't resolve yet.

"No module named 'frame_2D_alg_classes' " etc.

When adjusting the imports to include the folder (package) name, and adding some missing imports (generic_functions once), PyCharm sometimes stops complaining. However Python fails when attempting to run it - modules not found.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/boris-kz/CogAlg/issues/20#issuecomment-469868262, or mute the thread https://github.com/notifications/unsubscribe-auth/AUAXGWcG9IH_2iRMrUSDb0Pm7a0dJ__Pks5vTuQ4gaJpZM4benHC .

Twenkid commented 5 years ago

OK, thanks. I know it's changing, but there are moments when the project is in a stable state and can be run.

Not sure if he gets these messages

Sorry, I thought I was opening an issue on his branch, but it popped up as #20 here.

khanh93vn commented 5 years ago

Hi Tosh! Sorry I didn't check the email.

About main check: So that's the appropriate way to make a .py file could be used both as main or a module. Thank you for the information!

About the memory: Currently it only use RAM but I think the main place to store data should be hard disk for now, because the amount of data would be tremendously large and CogAlg don't have to process data in real time yet. Then, processing data could be loaded in chunks. However that is only an idea, I haven't planned it specifically yet, thanks for bringing it up.

About storing processed data from frame_blobs.py: I was thinking about the same thing when beginning to work the code of intra_blob.py, that would save time running frame_blobs again and again, the idea is to add a version number to it, so when frame_blobs.py is changed, it will be run again to replace outdated processed data. But that could also be done by hand, deleting the pickle file everytime structure of data in frame_blobs.py is changed. I don't know about RAM disk, will look into what it does, sorry about this.

I don't update Classes anymore to focus on this version because Boris could work more efficiently with it. I hope this won't negatively affect the overall process.

About the intra_blob_debug, in frame_2D_alg/, I plan to write the whole thing (including comp_inc_range, comp_comp_inc_deriv, angle_blobs), then debug at once, because these are changing very quickly at the moment.

In general, in terms of a programming, everything is kind of a mess now. It needs to be redesigned in various aspects. But to be honest, I don't clearly see the big picture yet, so best way is to stick along for now. However, there's a few things that come to mind:

Anyway, my apologies for the wall of text.

Twenkid commented 5 years ago

Hi Khan,

the wall of text.

NP, the more info - the better. :)

About storing processed data from frame_blobs.py: I was thinking about the same thing when beginning to work the code of intra_blob.py, that would save time running frame_blobs again and again, the idea is to add a version number to it, so when frame_blobs.py is changed, it will be run again to replace outdated processed data. But that could also be done by hand, deleting the pickle file every time structure of data in frame_blobs.py is changed. I don't know about RAM disk, will look into what it does, sorry about this.

RAM disk is used just like a normal one - file I/O. There's just a program with drivers that transparently manages them in the memory. E.g. ImDisk.

  • Next they'll be clustered into 2 categories: less changing and more changing. That is separated by the parameter ave.

That partitioning process reminds me of a binary search tree. BTW, if a classification over explicit classes is done, the final comparison is supposed to be selection from particular patterns, like A or B, or A,B,C.

What about exact matches? (And ==) In the early stages it's partitions > <, but in the final stages it might be exact and the filter (average) could be a single pattern.

This is actually more tricky than in 1D comp, where the derivative is a scalar. Currently we're using magnitude of gradient as the value. But I think it's better to use a linear transformation to determine the value used for clustering as that is a more general model. In fact, I think it should be used in meta-level of cognition.

You mean dot product/matrix multipplication?

  • The clusters of original inputs then serve as data point in a deeper level, with more variables. So the comparison will be performed on one variable, the rest will serve as coordinates in a hyperspace.

So, most important task, as usual, would be designing a data structure that could guarantee easy data access. Then there's the matter of processing order, which would affect the results, because of selective processing.

  • The clusters of original inputs then serve as data point in a deeper level, with more variables. So the comparison will be performed on one variable, the rest will serve as coordinates in a hyperspace. So, most important task, as usual, would be designing a data structure that could guarantee easy data access. Then there's the matter of processing order, which would affect the results, because of selective processing.

As of the data access, I call this "addressability of everything" and (incremental) Self-definition.

The processing order would probably affect the exact realization of the "shape bias" of the algorithm, which features/sub-patterns are first selected for classification.

The shapes (3D structure) are quickly found to be more general than the texture, color, lighting etc. Initially the infants are confused by texture, softness etc. and may take random features as primary classifiers, but eventually the 3D structure and the most general features win. (Unlike the current ANN with their susceptibility to be cheated with adversarial attacks).

https://thescienceofearlylearning.com/shape-bias/ https://artificial-mind.blogspot.com/search?q=shape+bias

the comparison will be performed on one variable, the rest will serve as coordinates in a hyperspace.

I've reached to one related central question, too: the decision when/whether a pattern in a more general meaning, a chunk/sequence of numbers/data, to be treated as a coordinate in an existing space (of patterns, a "Context", a space of possible patterns and their relations within given coordinate/pattern space), or as a different pattern/different context.

That relates to when the change/difference to a particular template is big enough to spawn a new pattern, a new level etc. I see this depends on the exact partitioning (the exact algorithm) and the exact "costs" and in some cases perhaps it doesn't matter, it may just draw different possible search trajectories.

As of the singleness, one-only magnitude. A floating point one (normalized), a maximum magnitude?

That seems good as a simple/principle/general, however introspectively/intuitively, for complex patterns it seems right to me that the comparison is rather an integer number or a ratio of the number of exact matches to sub-patterns, to the number of all possible ("best") matches for the particular class/"context", number of possible patterns within that particular range of search (spatial/coordinate space, number of combinations given sub-patterns and resolution of comparison, ...).

I realize that it's possible that I'm repeating something that is said already in the write-up, and that each such exact matches could be converted to fraction numbers (that ratio, 0.6, 0.7), but my point is that at higher levels it doesn't feel right to me to reach to "0.56 cat, 0.35 dog, 0.12 human" etc. like in the typical ANN.

There are particular definite sub-patterns which are matching exactly at a given level of comparison/abstraction/search/range with a given experience (templates, structures). Such as the shape of the head (and that there's "a head"), the shape of the tail, the shape of the eyes, the nozzles, the ears, the pawns; the sizes of the above; the color of the eyes, the exact pose, the enclosing contour of the figure which covers the biggest surface etc.

Eventually it either matches or doesn't match.

Starting (or ending) with exact matches, as this is also the simplest form of search and comparison.

When it's ambiguous, it's rather a more general pattern, like just "ears" or "head" (no class attached) - these in particular are likely to appear together and to confirm each other - or "an edge/triangle" or "a curve" (for ears). In that case the patterns would belong to an "unknown" species, not to "0.12 dog".

Also, I think that the >< kind of comparison is an extension of the exact match (two outcomes, if == is attached to one of the direction, or three outcomes).

It is one step more selective, like the binary search tree, and allows to change the direction of scanning. This seems as one reason in the biological world the animals to have two sides of the body, a pair of sensory inputs as of basic > < comparison of the input values and >< of the coordinates, so that a decision could be made and the entire space could be scanned (and there's no jump addressing like in the memory).

khanh93vn commented 5 years ago

RAM disk is used just like a normal one - file I/O. There's just a program with drivers that transparently manages them in the memory. E.g. ImDisk.

Thank you!

That partitioning process reminds me of a binary search tree. BTW, if a classification over explicit classes is done, the final comparison is supposed to be selection from particular patterns, like A or B, or A,B,C.

Well, there are only 10 types of people in the world: those who understand binary, and those who don't.

What about exact matches? (And ==) In the early stages it's partitions > <, but in the final stages it might be exact and the filter (average) could be a single pattern.

Yes that's what I believe is more close to our cognition process. Even at this level, in some cases, that ave parameter should be at minimum, to filter out noise. Sorry it's too hypothetical.

You mean dot product/matrix multipplication?

Yes, every linear mapping could be represented by a matrix that is identical with it. Well, if you are familiar with linear algebra, then this would all make sense. About why data should be transformed, I'll get to it later.

As of the data access, I call this "addressability of everything" and (incremental) Self-definition.

The processing order would probably affect the exact realization of the "shape bias" of the algorithm, which features/sub-patterns are first selected for classification.

The shapes (3D structure) are quickly found to be more general than the texture, color, lighting etc. Initially the infants are confused by texture, softness etc. and may take random features as primary classifiers, but eventually the 3D structure and the most general features win. (Unlike the current ANN with their susceptibility to be cheated with adversarial attacks).

https://thescienceofearlylearning.com/shape-bias/ https://artificial-mind.blogspot.com/search?q=shape+bias

Thank you for the piece of information! That is interesting indeed. Recognition of shapes and colors are on different level of generalization, from my hypothesis. I think classification by shapes to classification by colors is like judging a person by deeds to judging a person by physical shapes/features.

I've reached to one related central question, too: the decision when/whether a pattern in a more general meaning, a chunk/sequence of numbers/data, to be treated as a coordinate in an existing space (of patterns, a "Context", a space of possible patterns and their relations within given coordinate/pattern space), or as a different pattern/different context. That relates to when the change/difference to a particular template is big enough to spawn a new pattern, a new level etc. I see this depends on the exact partitioning (the exact algorithm) and the exact "costs" and in some cases perhaps it doesn't matter, it may just draw different possible search trajectories.

Yes, space/context is specified by higher level, although we don't have a higher level yet. More on that later.

As of the singleness, one-only magnitude. A floating point one (normalized), a maximum magnitude?

That seems good as a simple/principle/general, however introspectively/intuitively, for complex patterns it seems right to me that the comparison is rather an integer number or a ratio of the number of exact matches to sub-patterns, to the number of all possible ("best") matches for the particular class/"context", number of possible patterns within that particular range of search (spatial/coordinate space, number of combinations given sub-patterns and resolution of comparison, ...).

I realize that it's possible that I'm repeating something that is said already in the write-up, and that each such exact matches could be converted to fraction numbers (that ratio, 0.6, 0.7), but my point is that at higher levels it doesn't feel right to me to reach to "0.56 cat, 0.35 dog, 0.12 human" etc. like in the typical ANN.

Yes, I believe that is the representation of the probability distribution, used by machine learning for classification problems, we don't use that. As far as I know, human's top-down cognition is deterministic, not probabilistic.

There are particular definite sub-patterns which are matching exactly at a given level of comparison/abstraction/search/range with a given experience (templates, structures). Such as the shape of the head (and that there's "a head"), the shape of the tail, the shape of the eyes, the nozzles, the ears, the pawns; the sizes of the above; the color of the eyes, the exact pose, the enclosing contour of the figure which covers the biggest surface etc.

Eventually it either matches or doesn't match.

Starting (or ending) with exact matches, as this is also the simplest form of search and comparison.

When it's ambiguous, it's rather a more general pattern, like just "ears" or "head" (no class attached) - these in particular are likely to appear together and to confirm each other - or "an edge/triangle" or "a curve" (for ears). In that case the patterns would belong to an "unknown" species, not to "0.12 dog".

I believe this is the Unfolding of patterns in Boris' terms.

Also, I think that the >< kind of comparison is an extension of the exact match (two outcomes, if == is attached to one of the direction, or three outcomes).

It is one step more selective, like the binary search tree, and allows to change the direction of scanning. This seems as one reason in the biological world the animals to have two sides of the body, a pair of sensory inputs as of basic > < comparison of the input values and >< of the coordinates, so that a decision could be made and the entire space could be scanned (and there's no jump addressing like in the memory).

In machine learning, they have classification problems (discrete outputs) and regression problems (continuous outputs) separately. In CogAlg, classification is part of data compression and regression is for making prediction.

Those partitions are for forming new categories. Once in a research, they have compared the words for "mum" and "dad" from various different languages. They all similar in that pronunciation for "mum" is soft and pronunciation for "dad" is more rough. Volunteers then asked to choose one of the proposed shape for the pronunciation of one of the words. all of them chose more pointy or rough shape for "dad" and more round shape for "mum". So I think that bottom-up categorization is done by binary partitions, each 1 of them would x2 the number of categories, so number of possible categories is high, but only a handful of them would be used at a time. I guess this is similar to the binary string in schema theorem

As for the search space, it belongs to regression problem which deals with comparison and projection of inputs

Twenkid commented 5 years ago

You mean dot product/matrix multipplication?

Yes, every linear mapping could be represented by a matrix that is identical with it. Well, if you are familiar with linear algebra, then this would all make sense. About why data should be transformed, I'll get to it later.

In order to generalize the calculation of angle and distance (difference) for multidimensional vectors (now you use hypot, arctan)? Also eventually for efficiency. It could be also for compression and simplification, such as to search for linearly independent sets of vectors ("variables")? within the matrix of the variables (coordinates, as you mention); thus to do dimensionality reduction - elimination of linearly dependent vectors within the matrices.

As of the data access, I call this "addressability of everything" and

(incremental) Self-definition.

The processing order would probably affect the exact realization of the "shape bias" of the algorithm, which features/sub-patterns are first selected for classification.

The shapes (3D structure) are quickly found to be more general than the texture, color, lighting etc. Initially the infants are confused by texture, softness etc. and may take random features as primary classifiers, but eventually the 3D structure and the most general features win. (Unlike the current ANN with their susceptibility to be cheated with adversarial attacks).

https://thescienceofearlylearning.com/shape-bias/ https://artificial-mind.blogspot.com/search?q=shape+bias

Thank you for the piece of information! That is interesting indeed. Recognition of shapes and colors are on different level of generalization, from my hypothesis. I think classification by shapes to classification by colors is like judging a person by deeds to judging a person by physical shapes/features.

In the article in The Science of Early ... it suggests that in human's case "easier", lower generality (color) samples could short-circuit the learning of more general (wider range) shape/3D patterns.

In the Developmental Psychology literature they explain a stage when a child could be taught to name colors in her view and answer questions such as, say, this is green, red, blue. However she might be unable to interpret more complicated instructions related to objects having that color, such as "give me the red car", therefore color is not yet attached to the pattern "car", but only to some more vague representation of the global input at the moment. (It could be linguistic also, can't parse the sentence with all these mappings).

I think some pedagogical direction (suggestion, supervision) on what features are important and appropriate sets of input could make a lot of difference in learning (as I mentioned in the earlier discussion session). In the current DL they do that by providing plenty of input samples, thus different colors, sizes etc.

I've reached to one related central question, too: the decision

when/whether a pattern in a more general meaning, a chunk/sequence of numbers/data, to be treated as a coordinate in an existing space (of patterns, a "Context", a space of possible patterns and their relations within given coordinate/pattern space), or as a different pattern/different context. That relates to when the change/difference to a particular template is big enough to spawn a new pattern, a new level etc. I see this depends on the exact partitioning (the exact algorithm) and the exact "costs" and in some cases perhaps it doesn't matter, it may just draw different possible search trajectories.

Yes, space/context is specified by higher level, although we don't have a higher level yet. More on that later.

That's a good point to keep in mind: whenever there's an ambiguity in the lower representation, ask the higher level to clarify/choose a definite context/set of patterns it and to decide.

As of the singleness, one-only magnitude. A floating point one (normalized), a maximum magnitude?

That seems good as a simple/principle/general, however introspectively/intuitively, for complex patterns it seems right to me that the comparison is rather an integer number or a ratio of the number of exact matches to sub-patterns, to the number of all possible ("best") matches for the particular class/"context", number of possible patterns within that particular range of search (spatial/coordinate space, number of combinations given sub-patterns and resolution of comparison, ...).

I realize that it's possible that I'm repeating something that is said already in the write-up, and that each such exact matches could be converted to fraction numbers (that ratio, 0.6, 0.7), but my point is that at higher levels it doesn't feel right to me to reach to "0.56 cat, 0.35 dog, 0.12 human" etc. like in the typical ANN.

Yes, I believe that is the representation of the probability distribution, used by machine learning for classification problems, we don't use that. As far as I know, human's top-down cognition is deterministic, not probabilistic.

Good.

Also, I think that the >< kind of comparison is an extension of the exact

match (two outcomes, if == is attached to one of the direction, or three outcomes).

It is one step more selective, like the binary search tree, and allows to change the direction of scanning. This seems as one reason in the biological world the animals to have two sides of the body, a pair of sensory inputs as of basic > < comparison of the input values and >< of the coordinates, so that a decision could be made and the entire space could be scanned (and there's no jump addressing like in the memory).

In machine learning, they have classification problems (discrete outputs) and regression problems (continuous outputs) separately. In CogAlg, classification is part of data compression and regression is for making prediction.

Is this mapped to the "What & Where" notion of Boris? Classification is What, regression is Where?

IMO it's flexible, fixated/related upon the frames of the representation. Classification is also "where" - in the space of the representations of the algorithm, and it also is supposed to be some kind of continuous (to one degree or another).

One separation could be that it may be supposed to have only "singleton" representation for the respective pattern. However from a global view, I think same/related patterns could be conceptualized by several branches, when the initial higher context was different; like when there's ambiguity at the low level - the same single representation maps to many higher level patterns.

Classification is prediction as well, a part of the features are enough to recognize a pattern, they suggest the rest of the features and the consequences. The other features are sub patterns, ones included in the one with the biggest span - either or both time/space - which is recognized.

One recent lecture about that which suggests several ways to define domain knowledge:

https://www.cs.cmu.edu/~rsalakhu/NY_2019_v3.pdf https://www.youtube.com/watch?v=b8ABJZ7lfXU

It's for mapping them eventually in DL/ANN, but I think the higher level domain knowledge (which we understand) is supposed to be mirrored in one way or another in any machine learning/AI architecture, some of these representations, the shallow especially, are just the most compressed ones given selected constraints and the way they are stored.

Salakhutidinov's partitioning: Relational, Logical, Scientific.

The last one is about physics, prediction, what an AGI is about regarding the "regression" part at the low levels, and it is about (partial) differential equations (as in CogAlg: dx,dy, da, ... )

Those partitions are for forming new categories. Once in a research, they have compared the words for "mum" and "dad" from various different languages. They all similar in that pronunciation for "mum" is soft and pronunciation for "dad" is more rough. Volunteers then asked to choose one of the proposed shape for the pronunciation of one of the words. all of them chose more pointy or rough shape for "dad" and more round shape for "mum". So I think that bottom-up categorization is done by binary partitions, each 1 of them would x2 the number of categories, so number of possible categories is high, but only a handful of them would be used at a time. I guess this is similar to the binary string in schema theorem

I understand the minimum partitioning power of 2, however I don't think at higher level it's necessary or efficient to limit strictly to two, wouldn't the hierarchy grow too deep?

As of the roughness, I doubt the universality and objectivity of the vocal rule (and what's "rough" objectively), maybe the intonation. I guess they may mean having consonant sounds like b,d,p,t,tch (e.g. father, baba, papa, bahshtah, tatkoh, ottosan).

That reminds me of another research for intermodal analogies and generalization, people - I don't know their age and other social data - were given two names "Kiki" and "Bouba" and they were asked how they would name two drawn figures. The first was starry, edgy, while the second one was oval shaped.

They called the more edgy one "Kiki", and the smooth one - "Bouba". These particular sounds and letters match both visually and auditory - the letters (K has a sharp edge, o,ou is smooth) and by the pronunciation. The sound "K" is also "sharp" - an abrupt noise, a "sharp" sensation in the vocal tract, i.e. shorter than the "smoother", tones. On the other hand the o/ou are produced with an oval/smooth mouth, are tonal (no abrupt noise) and also we visually see the oval shape of the mouth when somebody else is speaking.

As for the search space, it belongs to regression problem which deals with comparison and projection of inputs

Isn't it also about classification? Reducing the space of branches within the hierarchy. IMO classification-regression relation seem mapped to each other in AGI.

Classification emerges out of comparing inputs at particular coordinates, thus the features "data points" are like repeating coordinates of the patterns within the input space. A given pattern is the input, more technically the set of possible detectable inputs, eventually at the lowest level, that would be perceived if the sensory coordinates are set to a particular location or through particular trajectory etc., according to particular frames of reference, which at low level are - could be represented as - also simplest vector coordinates in some kind of a low dimensional space, where low means lower, compared to the space of the higher cognitive levels.

khanh93vn commented 5 years ago

Sorry for the late reply.

As of the roughness, I doubt the universality and objectivity of the vocal rule (and what's "rough" objectively), maybe the intonation. I guess they may mean having consonant sounds like b,d,p,t,tch (e.g. father, baba, papa, bahshtah, tatkoh, ottosan).

That reminds me of another research for intermodal analogies and generalization, people - I don't know their age and other social data - were given two names "Kiki" and "Bouba" and they were asked how they would name two drawn figures. The first was starry, edgy, while the second one was oval shaped.

They called the more edgy one "Kiki", and the smooth one - "Bouba". These particular sounds and letters match both visually and auditory - the letters (K has a sharp edge, o,ou is smooth) and by the pronunciation. The sound "K" is also "sharp" - an abrupt noise, a "sharp" sensation in the vocal tract, i.e. shorter than the "smoother", tones. On the other hand the o/ou are produced with an oval/smooth mouth, are tonal (no abrupt noise) and also we visually see the oval shape of the mouth when somebody else is speaking.

That's interesting! And please excuse me for my terrible English.

About Classification and Regression. I think the problem lies in the terms. sorry for the confusion.

About shape bias, I got your point here. Supervising provide better progress. That's how human civilizations evolve, right?

About binary partitions. I still think it is the most general and could reproduces all other approaches.

It is hard to see with high level cognitions, but not impossible. For example, have you ever been noticed, in a particular situation, that something is wrong but didn't know why, then you seek answer, then realized something that you didn't know before. So, at that point you would have formed a new concept that could be used to predict similar situations that normally didn't notice. The generality of formed concept depends on the process when how you were seeking the answer, but the memories remain the same. Especially painful memories.

Twenkid commented 5 years ago

On Wed, Mar 13, 2019 at 9:40 AM Khanh Nguyen notifications@github.com wrote:

Sorry for the late reply.

NP. I got too "non-technical". :)

As of the roughness, I doubt the universality and objectivity of the vocal rule (and what's "rough" objectively), maybe the intonation. I guess they may mean having consonant sounds like b,d,p,t,tch (e.g. father, baba, papa, bahshtah, tatkoh, ottosan).

That reminds me of another research for intermodal analogies and generalization, people - I don't know their age and other social data - were given two names "Kiki" and "Bouba" and they were asked how they would name two drawn figures. The first was starry, edgy, while the second one was oval shaped.

They called the more edgy one "Kiki", and the smooth one - "Bouba". These particular sounds and letters match both visually and auditory - the letters (K has a sharp edge, o,ou is smooth) and by the pronunciation. The sound "K" is also "sharp" - an abrupt noise, a "sharp" sensation in the vocal tract, i.e. shorter than the "smoother", tones. On the other hand the o/ou are produced with an oval/smooth mouth, are tonal (no abrupt noise) and also we visually see the oval shape of the mouth when somebody else is speaking.

That's interesting! And please excuse me for my terrible English.

About Classification and Regression. I think the problem lies in the terms. sorry for the confusion.

Yes, the terms and mapping to different representations (code and low level input). I meant that when the concepts get general, they (their representations) may start to merge.

About shape bias, I got your point here. Supervising provide better progress. That's how human civilizations evolve, right?

Yes. In the example in the link - providing similar objects in the desired dimension/variable in a closer distance (than if there's no such direction), for faster search and for "priming".

About binary partitions. I still think it is the most general and could reproduces all other approaches.

I agree, it's the simplest as well and suggest simple selection criteria, which is an advantage. I've been reading CogAlg code from time to time, recently trying to see a simple algorithmic element, pattern of if()operation-return etc. doing some kind of such partitioning, but it's not that clear and simple yet. :)

I realize that one conceptual "mistake" of the approach towards creation of an algorithm, algorithm-genesis, or more broadly - factorization - of a problem, is thinking in too big chunks, too many variables and operations at once. That confuses and distracts us.

The generality of formed concept depends on the process when how you were seeking the answer

Yes - the range, the resolution, the hierarchy/ranges of search, to how many and which templates to compare, when to stop comparing etc. Also it depends on how the very "generality" is defined itself.

It is hard to see with high level cognitions, but not impossible. For example, have you ever been noticed, in a particular situation, that something is wrong but didn't know why, then you seek answer, then realized something that you didn't know before. So, at that point you would have formed a new concept that could be used to predict similar situations that normally didn't notice. The generality of formed concept depends on the process when how you were seeking the answer, but the memories remain the same. Especially painful memories.

Sometimes search can not be exhaustive and deterministic and the search space is discontinuous or the space is too vast for the resources which are invested at a given time/cognitive operation (e.g. find match with given constraints, level, depth...).

There are too many combinations, only a small part of them is visited.

For our, human, POV, an effective internal way to map and to traverse all possible trajectories in a systematic way may be missing as well.

Also, some of the constraints might be not well understood by ourselves, dependent on subconscious/unreachable rewards. I think the de-valuation of the emotional charge of painful memories has a significant non-cognitive element, turning off this supervision signal.

Thus the success of the explorative process has elements of chance for a "short term" evaluation - a period/search-resource which is "low" for full coverage/search given a particular goal and particular search hierarchy, space.

That's why sometimes it doesn't happen even if we tried - we can't understand or find a solution with the given resources, then it happens abruptly with an "Eureka moment".

That's going on when doing creative work - drawing/saying/writing something, then revisiting it/reviewing again and taking it as input, thinking about different aspects etc. which updates the goal and the specific patterns which are searched.

As of your example, I map it to my own experience: say, a sentence in an essay doesn't sound "right", but you realize why only after reading the whole text many times and internalize it. Finally you notice that in that sentence the message or even just a word is repeating something that was said elsewhere a few paragraphs earlier. You couldn't initially notice exactly, because the distance between the usages was too big for your working memory to remember that match exactly with the specific content. (It managed to notice that there was a repetition, but only vaguely, thus a more general representation might have existed already, but without enough details).

After reading the text many times, the representation moves to another form of memory - a longer term one, which has a bigger capacity and a wider span for comparison.

Then you may mark that stylistic fault and the next time intentionally search for such unwanted repetitions, and this becomes a higher level top-down comparison pattern.

A similar phenomenon is faced in longer term re-evaluation. When rereading old literary pieces of mine years ago, I have noticed that their style was "wrong" - there were redundant words, some sentences could be reworded to make the message more clear etc. Sometimes when you reread old stories of yours, they sound "awful" to you and you wonder why did you like it back then? This is how writing skills evolve, earlier writings or when you were more distracted or invested "shallower comparison" efforts have more repetitions, either word-level or structural or message-wise, due to shorter working memory - either due to maximum capacity or to distraction, i.e. shorter and "worse" comparison span or comparison filters - criteria for "goodness" of the text.

As of the generality - of the coverage over possible cases, domains and the same memories - I think only the lower generality memories remain the same, in long term. They establish themselves as building blocks.

The successful search may draw, at least, specific trajectories and coordinates within the existing hierarchy, or new ways for searching as higher levels or search patterns. I think these are called higher-order patterns in the CogAlg domain. I don't know, but maybe - filters, same-filter spans etc. which are selected when a particular target has to be located. Something like "context selection" when recognizing or focusing on particular details (patterns) or switching the mind to particular mode of operation, which involves some explicit or implicit "priming"- preliminary selection of particular sub-hierarchies and sets of patterns, filters, same-filter spans, resolution.

These higher-order patterns become new memories of their specific type.

You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/boris-kz/CogAlg/issues/20#issuecomment-472313064, or mute the thread https://github.com/notifications/unsubscribe-auth/AWSP2Co4KRk5NWcoVMnFJcPhOd7E4Dmhks5vWKtpgaJpZM4benHC .

khanh93vn commented 5 years ago

I see. Thanks for those info :+1:

Twenkid commented 5 years ago

Hi Khan, Boris,

Regarding that principle from https://github.com/boris-kz/CogAlg/issues/20#issuecomment-470021085:

  • The clusters of original inputs then serve as data point in a deeper level, with more variables. So the comparison will be performed on one variable, the rest will serve as coordinates in a hyperspace.

I think I see it more concretely now and will share my reasoning and speculations - a query for corrections, clarification and other abstract principles.

It is clear for the basic input - brightness (p). I assume the initial sequence of frame_blobs processing is a bit "different", because it has to combine x and y. It seems logical the next order "pixel" to be g? (or just summed i? and sign?), but then it and other parameters are summed to produce aggregates and a "subspace" within the initial input space (coordinates, lengths, the shape of the blob).

Is the general principle valid for the L, Y, X, I, Dy, Dx, G for the seg - there the compared variable seems the sign of the gradient? Also, as the patterns are hierarchical, each lower level pattern has additional coordinates, starting with y,x like displayed in:

 for P in seg[2]:
                for y, x, ...

And eventually they could be unfolded by traversing the pattern.

Regarding seg, it seems, these could be yet intermediate representations, that partitioning of > filter< (ave), which is explained in the cited comment one paragraph before the principle.

Regarding blob, it seems syntactically that Typ, sign, Ly, L, Y, X (in the intra_blob tree) are such coordinates of the blob. Then sub_blobs/layers have additional sub-coordinates with their Derts.

Blobs and (I, Dy, Dx, G)

Then from form_blob, frame[1].append(nt_blob(... I see that a 4-element pattern structure (I, Dy, Dx, G) is kept as a core one (like the initial comparison (i, dx, dy, g).

Thus is [0] supposed to be the general variable for comparison at a given point, and Dy, Dx, G local coordinates in the derivatives' space for level/stage of processing? (If such sub-space division is possible at that small granularity.)

Or rather and seems more logically, the whole 4-component dert is that "one" variable that is compared? However that's a 4-dimensional variable.

Then in different branches/POV (types of blobs) different components of that 4-element variable would be compared in various basic ways: difference (distance), angle, different ranges within current space, combinations of the above.

Where angle is also a derivative of distance - a function of the ratio of differences in adjacent orthogonal dimensions. BTW, comparisons like that, during container iterations: if blob.Derts[-1][-1] > ave_blob * 2: # G > fixed costs of comp_angle suggest that G's properties guide a traversal trajectory, which suggests it is used there as "coordinates"?

I see that this basic set and sequence of diff(1D distance)/2D distance (hypot)/angle(guides a projection, prediction in 2D) seem also inline with the old ladder of "difference, ratio, exponent, etc. higher power comparisons", aligned with 2D.

The higher powers are supposed to be the recursive calls to the above? through the branching and reapplication of the basic comparisons.

Therefore is that supposed to go on in a similar manner for further deeper branches like current intra_blob and for the future wider range ones like inter_blob and respectively "super_inter_blob" etc.?

...

comp_pixel-comp_angle-comp_deriv-comp_range

This is the current set of explicit comparison functions, which map to a 4-component derivative vector. Initially (i, dx, dy, g), then (I, DX, DY, G) (sometmes i is p_).

I notice the following general differences:

In comp_pixel, the input is not changed, [0] = [0]

In comp_angle the computed angle replaces the previous input, [0] = deriv(derts):

    dert__[:, :, 0] = a__
    dert__[1:-1, 1:-1, 1] = day__  
    dert__[1:-1, 1:-1, 2] = dax__
    dert__[1:-1, 1:-1, 3] = ga__

In comp_deriv the previous gradient becomes the input, like being shifted, [0] = [3]:

    dert__[:, :, 0] = g__
    dert__[1:-1, 1:-1, 1] = dy__ 
    dert__[1:-1, 1:-1, 2] = dx__
    dert__[1:-1, 1:-1, 3] = gg__

In comp_range the previous input p__ is repeated without comparison?, like in comp_pixel [0] = [0].

    dert__[:, :, 0] = p__
    dert__[:, :, 1] = dy__
    dert__[:, :, 2] = dx__
    dert__[:, :, 3] = g__
boris-kz commented 5 years ago

Todor,

On Tue, Mar 26, 2019 at 8:39 AM Todor Arnaudov notifications@github.com wrote:

Hi Khan, Boris,

Regarding that principle from #20 (comment) https://github.com/boris-kz/CogAlg/issues/20#issuecomment-470021085:

  • The clusters of original inputs then serve as data point in a deeper level, with more variables. So the comparison will be performed on one variable, the rest will serve as coordinates in a hyperspace.

That's for intra_blob (sub-recursion), in inter_blob (super-recursion) all parameters will be compared. Coordinates are still x and y, with extended range.

It is clear for the basic input - brightness (p). I assume the initial

sequence of frame_blobs processing is a bit "different", because it has to combine x and y. It seems logical the next order "pixel" to be g? (or just summed i? and sign?), but then it and other parameters are summed to produce aggregates and a "subspace" within the initial input space (coordinates, lengths, the shape of the blob).

Comparand is g | ga in comp_deriv, rng-distant p in comp_range, a in comp_angle

Is the general principle valid for the L, Y, X, I, Dy, Dx, G for the seg -

there the compared variable seems the sign of the gradient?

Segs and Ps are preserved for optional comp_P, otherwise they are probably not needed.

Also, as the patterns are hierarchical, each lower level pattern has

additional coordinates, starting with y,x like displayed in:

for P in seg[2]: for y, x, ...

And eventually they could be unfolded by traversing the pattern.

In 2D, principal patterns are blobs, sub_blobs, and future super_blobs

Then from form_blob, frame[1].append(nt_blob(... I see that a 4-element

pattern structure (I, Dy, Dx, G) is kept as a core one (like the initial comparison (i, dx, dy, g).

Thus is [0] supposed to be the general variable for comparison at a given point, and Dy, Dx, G local coordinates in the derivatives' space for level/stage of processing? (If such sub-space division is possible at that small granularity.)

Or rather and seems more logically, the whole 4-component dert is that "one" variable that is compared? However that's a 4-dimensional variable.

You are thinking of inter_blob, which is not in code yet. The nearest thing in code is comp_P_draft, I am working on it now (when not working on intra_blob). There, primary comp is per parameter, and match and miss is then integrated between parameters.

Then in different branches/POV (types of blobs) different components of that 4-element variable would be compared in various basic ways: difference (distance), angle, different ranges within current space, combinations of the above.

That will be sub-recursion in comp_P and comp_blob, nothing in code yet.

Where angle is also a derivative of distance - a function of the ratio of differences in adjacent orthogonal dimensions. BTW, comparisons like that, during container iterations: if blob.Derts[-1][-1] > ave_blob * 2: # G > fixed costs of comp_angle suggest that G's properties guide a traversal trajectory, which suggests it is used there as "coordinates"?

It's a selection criterion, but not a coordinate in this space.

I see that this basic set and sequence of diff(1D distance)/2D distance (hypot)/angle(guides a projection, prediction in 2D) seem also inline with the old ladder of "difference, ratio, exponent, etc. higher power comparisons", aligned with 2D. The higher powers are supposed to be the recursive calls to the above?

It's a difference adapted to 2D. Higher powers won't be used till super-recursion: comp_P and comp_blob

comp_pixel-comp_angle-comp_deriv-comp_range

This is the current set of explicit comparison functions, which map to a 4-component derivative vector. Initially (i, dx, dy, g), then (I, DX, DY, G) (sometmes i is p_).

I notice the following general differences:

In comp_pixel, the input is not changed, [0] = [0]

In comp_angle the computed angle replaces the previous input, [0] = deriv(derts):

dert[:, :, 0] = a dert[1:-1, 1:-1, 1] = day dert[1:-1, 1:-1, 2] = dax dert[1:-1, 1:-1, 3] = ga

In comp_deriv the previous gradient becomes the input, like being shifted, [0] = [3]:

dert[:, :, 0] = g dert[1:-1, 1:-1, 1] = dy dert[1:-1, 1:-1, 2] = dx dert[1:-1, 1:-1, 3] = gg

Right

In comp_range the previous input p__ is repeated without comparison?, like in comp_pixel [0] = [0].

dert[:, :, 0] = p dert[:, :, 1] = dy dert[:, :, 2] = dx dert[:, :, 3] = g

That only forms arrays of rng-distant comparands, actual comparison follows in horizontal comp, vertical com, and diagonal comp

khanh93vn commented 5 years ago

Now that I think about it, scanP() do look like a comparison between Ps' summed x coordinate

boris-kz commented 5 years ago

Yes, I did this initial comp_x (partial comp_P) -> xD to determine blob orientation and evaluate for flipping the blob and comp_P. Or that was the idea, but I only use it now inside comp_P. This is not settled yet, comp_x may be folded in comp_P.

On Tue, Mar 26, 2019 at 10:42 AM Khanh Nguyen notifications@github.com wrote:

Now that I think about it, scanP() do look like a comparison between Ps' summed x coordinate

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/boris-kz/CogAlg/issues/20#issuecomment-476678554, or mute the thread https://github.com/notifications/unsubscribe-auth/AUAXGVrHLu_kqE3YbWBAS2o6XKsZiz8dks5vajHogaJpZM4benHC .

Twenkid commented 5 years ago

Guys, are these speculations for the trends correct or what's wrong in general?

Two clear directions of development, maybe somewhat mapping to bottom-up and top-down at macro level (although it's all is "bottom-up" and both add "depth") and to the future feedback dynamics? (Adjustable filters - ave etc., yet far from implementation? Possible when the whole input space is "exhausted" with patterns and whole frames start to be compared? The initial same filter-span would be one frame?)

1) Sub-recursion chain: sub-pattern formation, from a larger span of input (patterns), like with blob, to smaller span - selection of a part of the pattern space and reduction of the span of the produced patterns.

2) Super-recursion chain: combining existing patterns to form a pattern with a bigger span of the coordinate coverage, comp_P ...

...

1) Reduction - compares and focuses on less elements (derivatives) than initially given: g, ga, a, p or keep the same number of derivatives for the following steps, adding depth.

2) Expansion - compares all current and/or adds more; extends the number of derivatives: comp_P ... - all parameters are compared. In the preliminary processing, Ps, Seg: adding L, Ly

The process goes through these steps of expansion-reduction-expansion-reduction ...

When the reduction runs out of material (redundancy, elements to traverse, novelty to compared previous levels, e.g. gradient gets < filter etc.), an expansion is needed, which brings in new material for comparison.

Etc.

I did this initial comp_x (partial comp_P) -> xD to determine blob orientation and evaluate for flipping the blob and comp_P

Is blob orientation and flipping in order to scan the blob over the longer side?

Regarding flip(), you will scan the blob's representation (can't see how exactly yet, unfolding down to y,x?)? It won't be running the algorithm from the comp_pixel through a primary Y-scan or flipped input?

I thought about the flipping from the ground-up, in an analogy to the Sobel filter, where x and y gradients are computed and then merged. I don't know is such scanning needed, but what I thought of was:

  1. Flip the raw input (transpose) in order to use the algorithm without cloning it with coordinate changes
  2. Apply the normal pattern formation
  3. When/if the Y-scanned patterns have to be merged or compared to the default X-scanned, reorder the reading of y,x , Ly, L to x,y, L, Ly etc. dimension-related variables (I guess there might be caveats here, though.)

On Tue, Mar 26, 2019 at 9:43 PM Boris Kazachenko notifications@github.com wrote:

Yes, I did this initial comp_x (partial comp_P) -> xD to determine blob orientation and evaluate for flipping the blob and comp_P. Or that was the idea, but I only use it now inside comp_P. This is not settled yet, comp_x may be folded in comp_P.

On Tue, Mar 26, 2019 at 10:42 AM Khanh Nguyen notifications@github.com wrote:

Now that I think about it, scanP() do look like a comparison between Ps' summed x coordinate

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/boris-kz/CogAlg/issues/20#issuecomment-476678554, or mute the thread < https://github.com/notifications/unsubscribe-auth/AUAXGVrHLu_kqE3YbWBAS2o6XKsZiz8dks5vajHogaJpZM4benHC

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/boris-kz/CogAlg/issues/20#issuecomment-476817250, or mute the thread https://github.com/notifications/unsubscribe-auth/AWSP2PRBitn2Lo3HNHUkfPG6rDdTcbWiks5vanhkgaJpZM4benHC .

boris-kz commented 5 years ago

(Adjustable filters - ave etc., yet far from implementation? Possible when the whole input space is "exhausted" with patterns and whole frames start to be compared? The initial same filter-span would be one frame?)

Yes, we need higher levels first

1) Sub-recursion chain: sub-pattern formation, from a larger span of input (patterns), like with blob, to smaller span - selection of a part of the pattern space and reduction of the span of the produced patterns.

Yes, forming lower layers of patterns / blobs

2) Super-recursion chain: combining existing patterns to form a pattern with a bigger span of the coordinate coverage, comp_P ...

Yes, forming higher levels of search

... 1) Reduction - compares and focuses on less elements (derivatives) than initially given: g, ga, a, p or keep the same number of derivatives for the following steps, adding depth.

That's expansion, not reduction

2) Expansion - compares all current and/or adds more; extends the number of derivatives: comp_P ... - all parameters are compared. In the preliminary processing, Ps, Seg: adding L, Ly

This is composition: reduction in the number of top-level patterns, but each has greater syntactic complexity

Is blob orientation and flipping in order to scan the blob over the longer side?

Yes

Regarding flip(), you will scan the blob's representation (can't see how exactly yet, unfolding down to y,x?)?

Right

It won't be running the algorithm from the comp_pixel through a primary Y-scan or flipped input?

It will

  1. When/if the Y-scanned patterns have to be merged or compared to the

default X-scanned, reorder the reading of y,x , Ly, L to x,y, L, Ly etc. dimension-related variables

Didn't think of that yet, but flip is for comp_P, and resulting PPs will have orientation and orientation-neutral parameters stored separately, so it shouldn't matter if the input was flipped

Twenkid commented 5 years ago

In comp_range, is the order of the comparands (+- of the shift) correct, either in code or the comments? Or I read the ranges incorrectly? (Sorry if so)

d__ = p__[comp_rng:, rng:-rng] - p__[:-comp_rng, rng:-rng] 
# bilateral comparison between p at coordinates (x, y + rng) and p at coordinates (x, y - rng)

OK, (between first and second), comment seems to match: +comp_rng: ... :-comp_rng

d__ = p__[rng:-rng, comp_rng:] - p__[rng:-rng, :-comp_rng]  
# bilateral comparison between p at coordinates (x + rng, y) and p at coordinates (x - rng, y)

OK:  ,+comp_rng:], ,:-comp_rng]

However then:

d__ = p__[bi_yd:, bi_xd:] - p__[:-bi_yd, :-bi_xd]   
# comparison between p (x - xd, y - yd) and p (x + xd, y + yd)

y, ?: code: +bi_yd:, :-by_yd, but the comment: y-yd, y+yd
x, ?: code: +bi_xd:, :-bi_xd, but the comment: x-xd, x+xd

Similarly with the last one, the +- order in code is different than in the comment:

 d__ = p__[bi_yd:, :-bi_xd] - p__[:-bi_yd, bi_xd:]  
# comparison between p (x + xd, y - yd) and p (x - xd, y + yd)
boris-kz commented 5 years ago

Thanks Todor! Khanh?

On Wed, Mar 27, 2019 at 9:02 AM Todor Arnaudov notifications@github.com wrote:

In comp_range, is the order of the comparands (+- of the shift) correct, either in code or the comments? Or I read the ranges incorrectly? (Sorry if so)

d = p[comp_rng:, rng:-rng] - p__[:-comp_rng, rng:-rng]

bilateral comparison between p at coordinates (x, y + rng) and p at coordinates (x, y - rng)

OK, (between first and second), comment seems to match: +comp_rng: ... :-comp_rng

d = p[rng:-rng, comp_rng:] - p__[rng:-rng, :-comp_rng]

bilateral comparison between p at coordinates (x + rng, y) and p at coordinates (x - rng, y)

OK: ,+comp_rng:], ,:-comp_rng]

However then:

d = p[bi_yd:, bi_xd:] - p__[:-bi_yd, :-bi_xd]

comparison between p (x - xd, y - yd) and p (x + xd, y + yd)

y, ?: code: +bi_yd:, :-by_yd, but the comment: y-yd, y+yd x, ?: code: +bi_xd:, :-bi_xd, but the comment: x-xd, x+xd

Similarly with the last one, x order in code is different than in the comment:

d = p[bi_yd:, :-bi_xd] - p__[:-bi_yd, bi_xd:]

comparison between p (x + xd, y - yd) and p (x - xd, y + yd)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/boris-kz/CogAlg/issues/20#issuecomment-477142582, or mute the thread https://github.com/notifications/unsubscribe-auth/AUAXGbJIV5pW2sIRVwtB-l1atS7G8eXAks5va2vVgaJpZM4benHC .

khanh93vn commented 5 years ago

Thank you.

The order in the code is correct. I'll make comments better next push.

d__ records how p is changing at particular positions so making sure the direction of comp is correct is important.

Here we use a common convention that change is with respect to (+)direction.

Notice that in the top right + bottom left quadrants, the decomposition coefficient for dx is reversed in sign. That is because of the reversed order of comparands in x axis.

In a way, it's nearly identical to partial derivatives in calculus.

Twenkid commented 5 years ago

On Wednesday, March 27, 2019, Khanh Nguyen notifications@github.com wrote:

Thank you.

The order in the code is correct. I'll make comments better next push.

OK, thanks for checking it.

d__ records how p is changing at particular positions so making sure the direction of comp is correct is important.

Yes, and for being consistent.

Here we use a common convention that change is with respect to (+)direction.

It is also the direction for the primary comparison, current minus previous, in both dimensions.

Notice that in the top right + bottom left quadrants, the decomposition coefficient for dx is reversed in sign. That is because of the reversed order of comparands in x axis.

OK, good to mention that explicitly.

In a way, it's nearly identical to partial derivatives in calculus.

I see, it appears so in the notation as well. The change of f() with one variable changing d..., with respect to the other variable(s) being constant.

You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/boris-kz/CogAlg/issues/20#issuecomment-477226006, or mute the thread https://github.com/notifications/unsubscribe-auth/AWSP2KsQcztagwydSRBF8MXsqx1_2102ks5va5W6gaJpZM4benHC .