beached / daw_json_link

Fast, convenient JSON serialization and parsing in C++
https://beached.github.io/daw_json_link/
Boost Software License 1.0
460 stars 30 forks source link

daw_json_link's internal iterators behave differently than standard iterators #432

Open pkerichang opened 1 month ago

pkerichang commented 1 month ago

so I'm still trying to get my custom map data structure working as described in Issue 428, I have the modified simple example as below:

#include <daw/daw_ensure.h>
#include <daw/json/daw_json_link.h>

#include <iostream>
#include <iterator>
#include <type_traits>
#include <vector>

template <typename Iter> std::size_t my_distance(Iter first, Iter last) {
    std::size_t count = 0;
    while (first != last) {
        // *first;  // issue 1
        ++first;
        ++count;
        if (count > 1000) {
            throw std::runtime_error("oops! more than 1000 elements?!");
        }
    }
    return count;
}

template <class Key, class T> struct vector_map {
    using value_type = std::pair<Key, T>;
    using vector_type = std::vector<value_type>;
    using mapped_type = T;
    using key_type = Key;
    using reference = typename vector_type::reference;
    using const_reference = typename vector_type::const_reference;
    using iterator = typename vector_type::iterator;
    using const_iterator = typename vector_type::const_iterator;

  private:
    vector_type data_;

  public:
    vector_map() = default;

    template <typename Iter> vector_map(Iter first, Iter last) {
        data_.reserve(my_distance(first, last));  // issue 2
        while (first != last) {
            auto [k, v] = *first;
            emplace(std::move(k), std::move(v));
            ++first;
        }
    }
    template <typename Container,
              std::enable_if_t<not std::is_same_v<vector_map, daw::remove_cvref_t<Container>>,
                               std::nullptr_t> = nullptr>
    vector_map(Container &&values) : vector_map(std::begin(values), std::end(values)) {}

    bool operator==(vector_map const &rhs) const { return data_ == rhs.data_; }

    const_iterator begin() const { return data_.begin(); }

    const_iterator end() const { return data_.end(); }

    template <class... Args> std::pair<iterator, bool> emplace(Args &&...args) {
        data_.emplace_back(std::make_pair(std::forward<Args>(args)...));
        return std::make_pair(data_.end() - 1, true);
    }
};

namespace daw::json {
template <class Key, class T> struct json_data_contract<vector_map<Key, T>> {
    using class_type = vector_map<Key, T>;
    using type = json_type_alias<json_key_value_no_name<class_type>>;
};
} // namespace daw::json

void json_test(std::string_view data_view) {
    auto obj = daw::json::from_json<vector_map<std::string, int>>(data_view);
    auto obj_json = daw::json::to_json(
        obj, daw::json::options::output_flags<daw::json::options::SerializationFormat::Pretty>);

    std::cout << obj_json.c_str() << std::endl;
}

int main() {
    auto v = vector_map<std::string, int>{};
    v.emplace("Hello", 42);
    v.emplace("World", 24);
    auto json_doc = daw::json::to_json(v);
    json_test(json_doc);
    auto v2 = daw::json::from_json<vector_map<std::string, int>>(json_doc);
    daw_ensure(v == v2);
}

The only difference from the example you gave in Issue 428 is that I try to pre-allocate the underlying vector with the correct size, by calling my_distance(first, last), which is a modified dumb version of std::distance().

With this code, my_distance() raises a runtime error, as it goes in an infinite loop because it seems like ++first does nothing unless you evaluate the iterator by calling *first first. This is different than how iterators work by convention. In fact, this is why std::distance() will not work on these iterators.

If I uncomment the *first line, then this code will raise a daw::json::v3_24_0d::json_exception, with the error Unexpected end of data. This implies that even though the iterators are pass-by-value to the my_distance() function, incrementing the iterator copies impacted the original iterator. Again, this is different than how iterators work by convention.

In summary:

Since this is very non-standard, if things have to stay this way, it should be clearly documented. Also, is there anyway I can find out the size of the iterator range, even if it takes linear time to do so?

beached commented 1 month ago

The documentation could be better here, but it has never been an issue either as it's intended to fill containers which will only call distance on forward iterators which they sometimes are. Right now, it allows much cheaper filling. I could add an option to fulfill input fully however, but in the mean time a distance like the following would work

std::ptrdiff_t distance( auto first, auto last ) {
  std::ptrdiff_t count = 0;
  while( first != last ) {
    *first;
    ++count;
    ++first; // so it doesn't look weird
  }
  return count;
}

It will generally not be best to reserve the vector based on the size here though, unless the iterator passed to your constructor is forward. It sometimes is when the size is known because that member was in a different order to that needed. The cost of parsing will probably be less than that of allocation. What is often fruitful is to guess, in the static size case that probably isn't needed though as there is no allocation.

pkerichang commented 1 month ago

Ah, it's been quite a while since I looked at iterators in C++, for some reason I completely forgot about input iterators (iterators which only support a single pass). Sorry about the confusion. So the only difference between daw_json_link's iterator and conventional input iterators is that you must dereference the iterator before incrementing. Probably not a big deal, but also a good idea to mention in the documentation.

Does iterators in daw_json_link support iterator_traits/iterator_category, so I can use it to figure out when the iterator arguments are forward v.s. input, and thus I can preallocate memory if possible? Not a big deal, just curious.

beached commented 1 month ago

The iterator does not fullfill input either because of op++ and equality comparable. They do work in all 3 of the standard implementations however for constructing/inserting into containers. Adding this as an option isn’t much, it just hasn’t been needed.

pkerichang commented 1 month ago

Sounds fine to me, should I close this unless you want to keep track of documentation update?