prep getToken for strstr

dstroy0 / InputHandler

Arduino input handler

https://dstroy0.github.io/InputHandler/

GNU General Public License v3.0

1 stars 0 forks source link

prep getToken for strstr #30

Closed dstroy0 closed 2 years ago

dstroy0 commented 2 years ago

for any delimiter > 1 looping strstr will be faster than bytewise scan

dstroy0 commented 2 years ago

we can just

memchr and memcmp

// pseudo
// look for interesting "char"
ptr memchr input _delim_[0] input_len 
if ptr != NULL // found interesting "char"
  result = memcmp _delim_ ptr sizeof _delim_len_  // if result == 00, match

find c-string tokens
find regular tokens
ignore hanging control char

find a delimiter in input, add null to token string, get the difference between (ptr + delim) to next delimiter or end of input, memcpy the token to token string, add null

dstroy0 commented 2 years ago

char *ptr = strchr((char*)data, _delim_[0]);
    size_t pos = 0;
    while (pos < len)
    {
        if (pos == len)
        {
            break;
        }        
        if (ptr == NULL)
        {
            break;
        }
        if (memcmp(_delim_, ptr, _delim_len_) == 0)
        {
            Serial.print(F("found delim "));
            pos = ptr - (char *)data;
            Serial.println(pos);
            ptr = (char*)(data + pos + _delim_len_);
        }
        ptr = strchr(ptr, _delim_[0]);
    }

this works, we can break this out into its own function and pattern match whatever we want to now

dstroy0 commented 2 years ago

char *ptr = (char*)memchr((char*)data, _delim_[0], len);
    size_t pos = 0;
    size_t prev_pos = 0;
    size_t token_len = 0;
    while (pos < len)
    {
        if (ptr == NULL || pos == len)
        {
            break;
        }

        // delim test                
        if (memcmp(_delim_, ptr, _delim_len_) == 0)
        {
            Serial.print(F("found delim "));
            pos = ptr - (char *)data;            
            token_len = pos - prev_pos - 1;
            Serial.println(pos);
            Serial.print(F("token_len "));            
            Serial.println(token_len + 1);
            //memcpy(_token_buffer_ + token_buffer_index, (ptr - token_len), token_len);
            //_data_pointers_[_data_pointers_index_] = &_token_buffer_[token_buffer_index];
            //_data_pointers_index_++; 
            //token_buffer_index += token_len + 1U; // null sep
            ptr = (char*)(data + pos + _delim_len_);
            prev_pos = pos;
        }
        ptr = (char*)memchr(ptr, _delim_[0], (len - pos));
    }

first I will scan for c-string delimiters and delimiters, if reg delim are before the c-string, get tokens until the c-string, then get the whole c-string token, else vice versa, continue...

dstroy0 commented 2 years ago

I pushed a commit with the beginnings of this

dstroy0 commented 2 years ago

I think that I need to just combine the bytewise scan and memcmp. I want to avoid repeat scans.

dstroy0 commented 2 years ago

Ok, I refactored getToken's behavior with the intention of giving users the option of using it to parse input into a token string, separated as specified, csv or whatever, with pointers to each token.

we scan bytewise until encounter an interesting character, then memcmp if the interesting character sequence is greater than 1, if there's a delimiter match, put a separator in the token buffer,

dstroy0 commented 2 years ago

It's working, decomposes the input into the token_buffer with a char sep.