VectorCamp / vectorscan

A portable fork of the high-performance regular expression matching library
https://www.vectorcamp.gr/project/vectorscan/
Other
512 stars 55 forks source link

Rebar Based Bug #304

Closed gtsoul-tech closed 1 week ago

gtsoul-tech commented 4 months ago

Rebar test based bug

imported/leipzig/math-symbols,count,vectorscan,5.4.11 2024-05-22,"count mismatch, expected 69, got 68" imported/leipzig/bounded-strings-ending-z,count-spans,vectorscan,5.4.11 2024-05-22,failed to run command for 'vectorscan' // buffer overflow detected : terminated imported/lh3lh3-reb/email,grep,vectorscan,5.4.11 2024-05-22,"count mismatch, expected 15057, got 14843" imported/lh3lh3-reb/date,grep,vectorscan,5.4.11 2024-05-22,"count mismatch, expected 668, got 659" imported/lh3lh3-reb/uri-or-email,grep,vectorscan,5.4.11 2024-05-22,"count mismatch, expected 32539, got 32327"

The default and AVX2 builds works as intended on [v5.4.11] release (https://github.com/VectorCamp/vectorscan/tree/vectorscan/5.4.11) On develop all builds get wrong results AVX512 AVX512VBMI builds get wrong results from #197 onward

the data file used in the unit test is found in the https://github.com/BurntSushi/rebar Unit test for the first inconsistency

TEST(rebar, leipzig_math_symbols_count) {
    hs_database_t *db = nullptr;
    hs_compile_error_t *compile_err = nullptr;
    CallBackContext c;
    const char *expr = "\\p{Sm}";
    const unsigned flag = HS_FLAG_UCP | HS_FLAG_UTF8;
    const unsigned id= 1;
    hs_error_t err = hs_compile(expr, flag, HS_MODE_BLOCK,nullptr, &db, &compile_err);

    ASSERT_EQ(HS_SUCCESS, err);
    ASSERT_TRUE(db != nullptr);

    hs_scratch_t *scratch = nullptr;
    err = hs_alloc_scratch(db, &scratch);
    ASSERT_EQ(HS_SUCCESS, err);
    ASSERT_TRUE(scratch != nullptr);

    std::ifstream file("/unit/hyperscan/datafiles/leipzig-3200.txt");
    std::stringstream buffer;
    buffer << file.rdbuf(); // Read the file into the buffer
    std::string data = buffer.str(); // Convert the buffer into a std::string

    c.halt = 0;
    err = hs_scan(db, data.c_str(), data.size(), 0, scratch, record_cb,
                  reinterpret_cast<void *>(&c));
    ASSERT_EQ(HS_SUCCESS, err);
    ASSERT_EQ(69, c.matches.size());

    hs_free_database(db);
    err = hs_free_scratch(scratch);
    ASSERT_EQ(HS_SUCCESS, err);
}
markos commented 1 week ago

Forgot to actually close this, the fixes were merged in #305 and #307