Closed Daniel061 closed 5 years ago
Changed Tokenizer routine in Lobes.h to force all lower case tokens. tmpToken = tmpToken + (int(tolower(str_Data[y-1])))*PlaceValue; MemoryCell storage location is consistent but somewhere the original uppercase gets dropped to lower in original string.
Problem detected! Some word type descriptors are upper case.
In normal program flow, the forced lower case comparisons and searches seems complete however, with the addition of pre and post pattern storage used for learning, some characters need to be upper case. An optional parameter will need to be added to tokenizer such as ForceLowerCase=true specifically for pattern storage and comparison.
Proposal to change Tokenizer
Current form:
int Tokenize (string str_Data)
{
int z;
int y;
int PlaceValue;
int tmpToken;
z = str_Data.size();
PlaceValue = 1;
tmpToken = 0;
for( y = z; y > 0; y--)
{
tmpToken = tmpToken + (int(tolower(str_Data[y-1])))*PlaceValue;
PlaceValue ++;
}
return tmpToken;
}
Change to;
int Tokenize (string str_Data, bool ForceLowerCase = true)
{
int z;
int y;
int PlaceValue;
int tmpToken;
z = str_Data.size();
PlaceValue = 1;
tmpToken = 0;
for( y = z; y > 0; y--)
{
if (ForceLowerCase) {
tmpToken = tmpToken + (int(tolower(str_Data[y-1])))*PlaceValue; }
else{
tmpToken = tmpToken + (int(str_Data[y-1]))*PlaceValue;
}
PlaceValue ++;
}
return tmpToken;
}
Also change in c_Lobes.h;
void SavePreAndPostPatternConstruction(string PreConstructionPattern,string PostConstructionPattern){
int PreToken = Tokenize(PreConstructionPattern, false);
int PostToken = Tokenize(PostConstructionPattern,false);
NOTE: (There is a local copy of Tokenizer in c_Lobes.h , Need to make this the only copy. There is also a copy in c_MemoryCells.h, Need to see if this can be deleted.
Finally change in c_Language.h;
if(LeftLobeMemory[Tokenize(CorrectedPattern)].GetpIsSet() == true){ //seen this pattern before
CorrectedPattern = LeftLobeMemory[LeftLobeMemory[Tokenize(Pattern)].GetPointerToNextPattern()].GetpCellDataString();
ConfidenceLevel = 100;}
To;
if(LeftLobeMemory[Tokenize(CorrectedPattern,false)].GetpIsSet() == true){ //seen this pattern before
CorrectedPattern = LeftLobeMemory[LeftLobeMemory[Tokenize(Pattern,false)].GetPointerToNextPattern()].GetpCellDataString();
ConfidenceLevel = 100;}
This should prevent no loss of word type data that uses some char() upper case descriptors.
To complete this change;
All other uses of Tokenizer through out the program can default to true.
If a user uses uppercase, the MemoryCell storage does not separate original and lower case storage properly.
Perhaps start in Sentence.h to ensure case handling and then in the routine InstallNewWord. (To be located for this document later)