Tessil / hat-trie

C++ implementation of a fast and memory efficient HAT-trie
MIT License
793 stars 114 forks source link

Deserialize failed when the amount of key is larger #58

Closed Chen-Xuming closed 2 months ago

Chen-Xuming commented 3 months ago

I tested the serialize and deserialize example with more key-values (no boost version), but it deserialize failed. const tsl::htrie_map<char, std::int64_t> map = {{"one", 1}, {"two", 2}, {"three", 3}, {"four", 4}};

const tsl::htrie_map<char, std::int64_t> map = { 
        {"one", 1}, 
        {"two", 2},
        {"three", 3}, 
        {"three234", 3},
        {"th324ree", 3},
        {"thr534543ee", 3},
        {"th56ree", 3},
        {"th54ree", 3},
        {"th45676ree", 3},
        {"th6ree", 3},
        {"th56ree", 3},
        {"three", 3},
        {"th4ghj5ree", 3},
        {"th678ree", 3},
        {"thj6h7ree", 3},
        {"thkire89e", 3},
        {"thghryuee", 3},
        {"thjrgyee", 3},
        {"thgjghjhree", 3},
        {"thhgjghhjhree", 3},
        {"thhgjree", 3},
        {"thhgjree", 3},
        {"thrjee", 3},
        {"tjhree", 3},
        {"tghhree", 3},
        {"thghjree", 3},
        {"tghjjghjhree", 3},
        {"thhjrghjee", 3},
        {"thrghjee", 3},
        {"thrghjee", 3},
        {"thrjghjee", 3},
        {"thhgree", 3},
        {"thhgree", 3},
        {"thjkree", 3},
        {"thklhree", 3},
        {"thpqweoip;oiree", 3},
        {"thasderee", 3},
        {"thr4w3ewqaqeree", 3},
        {"thewrsadree", 3},
        {"four", 4} 
};  // more than 30 key-values
Tessil commented 2 months ago

Thank you for the report.

I tried it out with the list of values you provided and it seems to work well.

Do you have a full example? What kind of failure are you encountering?

Chen-Xuming commented 2 months ago

Hi Tessil. I tested the same code on both Windows and Linux systems. On the Linux system, deserialization works correctly, but on Windows, it throws an exception (ios_base::failbit set: iostream stream error). The following is the information about the system and compiler:

I noticed a comment in htrie_hash.h that says 'MSVC < 2017 is not conformant'. But my MSVC version is 2019.

The test code is modified based on the provided example.

First, class serializer and class deserializer are exactly the same as in the example (no boost version). To debug, I added a try-catch block to deserializer::operator().

// serializer.h

#pragma once
#include <fstream>

class serializer {
public:
    serializer(const char* file_name) {
        m_ostream.exceptions(m_ostream.badbit | m_ostream.failbit);
        m_ostream.open(file_name);
    }

    template<class T,
        typename std::enable_if<std::is_arithmetic<T>::value>::type * = nullptr>
        void operator()(const T& value) {
        m_ostream.write(reinterpret_cast<const char*>(&value), sizeof(T));
    }

    void operator()(const char* value, std::size_t value_size) {
        m_ostream.write(value, value_size);
    }

private:
    std::ofstream m_ostream;
};

class deserializer {
public:
    deserializer(const char* file_name) {
        m_istream.exceptions(m_istream.badbit | m_istream.failbit | m_istream.eofbit);
        m_istream.open(file_name);
    }

    template<class T,
        typename std::enable_if<std::is_arithmetic<T>::value>::type * = nullptr>
        T operator()() {
        T value;

        m_istream.read(reinterpret_cast<char*>(&value), sizeof(T));

        return value;
    }

    void operator()(char* value_out, std::size_t value_size) {
        try
        {
            m_istream.read(value_out, value_size);
        }
        catch (const std::ifstream::failure & e)
        {
            std::cerr << "Exception occurred: " << e.what() << std::endl;

            if (m_istream.bad())
            {
                std::cerr << "Stream is in a bad state (badbit set)." << std::endl;
            }
            else if (m_istream.fail())
            {
                std::cerr << "Stream operation failed (failbit set)." << std::endl;
            }
            else if (m_istream.eof())
            {
                std::cerr << "End of file reached (eofbit set)." << std::endl;
            }
        }
    }

private:
    std::ifstream m_istream;
};

Second, The function test_htrie_map creates a map containing n_element key-value pairs, then performs serialization and deserialization.

// main.cpp

#include "htrie/htrie_map.h"
#include "serializer.h"
using namespace tsl;

using mapType = htrie_map<char, int>;
void test_htrie_map(int n_element)
{
    // build a trie
    mapType map;
    for (int i = 0; i < n_element; i++)
    {
        map.insert(std::to_string(i), i);
    }

    // serialize
    const char* file_name = "htrie_map.data";
    {
        serializer serial(file_name);
        map.serialize(serial);
    }

    // deserialize
    {
        deserializer dserial(file_name);
        auto map_deserialized = mapType::deserialize(dserial);

        assert(map == map_deserialized);
    }

    // query from the deserialized trie
    {
        deserializer dserial(file_name);

        const bool hash_compatible = true;
        auto map_deserialized = 
            mapType::deserialize(dserial, hash_compatible);

        assert(map == map_deserialized);

        int count = 0;
        string prefix = "1";
        auto prefix_range = map_deserialized.equal_prefix_range(prefix);
        for(auto it = prefix_range.first; it != prefix_range.second; ++it) {
            count++;
        }
        std::cout << "count = " << count << std::endl;
    }
}

int main()
{
    test_htrie_map(20);     // works well.
    test_htrie_map(30);     // throws an exception  
}

On the Ubuntu system, everything works well. On the Windows system, if n_element is larger than some value, the 'ios_base::failbit set: iostream stream error' will occur when deserializer calls function m_istream.read().

Tessil commented 2 months ago

Can you try to open the file in binary mode with std::ios::binary? I forgot to put that in the non-Boost example.

Chen-Xuming commented 2 months ago

Thank you, the program is now working correctly.