Daniel061 / Sophie6.1

AI from stumbling Baby to Sentient being.
GNU General Public License v3.0
1 stars 1 forks source link

Upper case #3

Closed Daniel061 closed 5 years ago

Daniel061 commented 5 years ago

If a user uses uppercase, the MemoryCell storage does not separate original and lower case storage properly.

Perhaps start in Sentence.h to ensure case handling and then in the routine InstallNewWord. (To be located for this document later)

Daniel061 commented 5 years ago

Changed Tokenizer routine in Lobes.h to force all lower case tokens. tmpToken = tmpToken + (int(tolower(str_Data[y-1])))*PlaceValue; MemoryCell storage location is consistent but somewhere the original uppercase gets dropped to lower in original string.

Daniel061 commented 5 years ago

Problem detected! Some word type descriptors are upper case.

In normal program flow, the forced lower case comparisons and searches seems complete however, with the addition of pre and post pattern storage used for learning, some characters need to be upper case. An optional parameter will need to be added to tokenizer such as ForceLowerCase=true specifically for pattern storage and comparison.

Daniel061 commented 5 years ago

Proposal to change Tokenizer

Current form:

    int Tokenize (string str_Data)
    {
        int z;
        int y;
        int PlaceValue;
        int tmpToken;

            z = str_Data.size();
            PlaceValue = 1;
            tmpToken = 0;
            for( y = z; y > 0; y--)
            {
                tmpToken = tmpToken + (int(tolower(str_Data[y-1])))*PlaceValue;

                PlaceValue ++;
            }
        return tmpToken;

}

Change to;

    int Tokenize (string str_Data, bool ForceLowerCase = true)
    {
        int z;
        int y;
        int PlaceValue;
        int tmpToken;

            z = str_Data.size();
            PlaceValue = 1;
            tmpToken = 0;
            for( y = z; y > 0; y--)
            {
              if (ForceLowerCase) {
                tmpToken = tmpToken + (int(tolower(str_Data[y-1])))*PlaceValue; }
              else{
                tmpToken = tmpToken + (int(str_Data[y-1]))*PlaceValue;
               }

                PlaceValue ++;
            }
        return tmpToken;

}

Also change in c_Lobes.h;

void SavePreAndPostPatternConstruction(string PreConstructionPattern,string PostConstructionPattern){

    int PreToken  = Tokenize(PreConstructionPattern, false);
    int PostToken = Tokenize(PostConstructionPattern,false);

NOTE: (There is a local copy of Tokenizer in c_Lobes.h , Need to make this the only copy. There is also a copy in c_MemoryCells.h, Need to see if this can be deleted.

Finally change in c_Language.h;

if(LeftLobeMemory[Tokenize(CorrectedPattern)].GetpIsSet() == true){                     //seen this pattern before
    CorrectedPattern = LeftLobeMemory[LeftLobeMemory[Tokenize(Pattern)].GetPointerToNextPattern()].GetpCellDataString();

ConfidenceLevel = 100;}

To;

if(LeftLobeMemory[Tokenize(CorrectedPattern,false)].GetpIsSet() == true){                     //seen this pattern before
    CorrectedPattern = LeftLobeMemory[LeftLobeMemory[Tokenize(Pattern,false)].GetPointerToNextPattern()].GetpCellDataString();

ConfidenceLevel = 100;}

This should prevent no loss of word type data that uses some char() upper case descriptors.

Daniel061 commented 5 years ago

To complete this change;

All other uses of Tokenizer through out the program can default to true.