LonnyGomes / hexcurse

Hexcurse is a ncurses-based console hexeditor written in C
Other
180 stars 17 forks source link

faster search #12

Open samozrejmost opened 9 years ago

samozrejmost commented 9 years ago

Hello,

because I sometimes analyze memory dumps with hexcurse, I needed a faster search - so I implemented the Boyer–Moore string search algorithm to the hexSearch() function. It works fine for me, but it's a bit crude now (for example it doesn't work for edited files), but if you are interested, I can send you the full "patch".

Sample code:

// algorithm based on the wikipedia example
off_t hexSearchBM(FILE *fp, int pat[], off_t startfp, size_t patlen)
{
    if (! (pat && patlen > 0)) return -1;

    size_t      i, m = 1;
    int         j = patlen - 1;
    size_t      delta1 [ ALPHABET_LEN ];
    size_t      *delta2 = (size_t *)malloc(patlen * sizeof(size_t));

    char        *buf;
    char        *patt = (char *) malloc(patlen);

    if (posix_memalign((void **)&buf, getpagesize(), BUF_L) != 0)
        return -1;

    if (! (delta2 && patt)) return -1;

    make_delta1(delta1, pat, patlen);
    make_delta2(delta2, pat, patlen);
    for (i = 0; i < patlen; i++) patt[i] = (unsigned char) pat[i];

    size_t  n, rem_bytes = 0, bytes_to_read = BUF_L, full_length;
    off_t   pos1, pos2;

    fseeko(fp, startfp, SEEK_SET);          /* begin from loc     */

    pos1 = pos2 = startfp;
    while (true) {
        n = fread(&buf[rem_bytes], 1, bytes_to_read, fp);
        full_length = n + rem_bytes;
        if (n == 0 || full_length < patlen) break;
        pos2 = pos1 + full_length;

        i = patlen - 1;
        while (i < full_length) {
            j = patlen - 1;
            while (j >= 0 && (buf[i] == patt[j])) {
                --i;
                --j;
            }
            if (j < 0) {
                free(buf); free(patt); free(delta2);
                return (pos1 + i + 1);
            }

            m = max(delta1[(unsigned char) buf[i]], delta2[j]);
            i += m;
        }

        rem_bytes = full_length + m - i - 1;
        if (rem_bytes > full_length) rem_bytes = full_length;

        bytes_to_read = BUF_L - rem_bytes;
        memmove(buf, &buf[full_length - rem_bytes], rem_bytes);
        pos1 = pos2 - rem_bytes;
    }

    free(buf);
    free(patt);
    free(delta2);

    return -1;
}
ghost commented 9 years ago

Very cool, definitely send along the full patch!

samozrejmost commented 9 years ago

Sending the patch (pastebin), I diffed the src directories this way:

diff -uprN -x '*.o' -x 'Makefile*' -x '*.Po' hexcurse-master-org/src hexcurse-master/src

Changelog:

I tested it on my Ubuntu 10.10 desktop (32bit) and Ubuntu 14.04 server (64bit).

I hope it will be useful ;) !

ghost commented 9 years ago

Haven't forgot about this! Plan on integrating things this week!

prso commented 3 years ago

For what it's worth I have integrated this patch with some improvements in the branch improve-search of my own fork.

Also I sent a pull request.