felixguendling / cista

Cista is a simple, high-performance, zero-copy C++ serialization & reflection library.
https://cista.rocks
MIT License
1.74k stars 110 forks source link

Container with stable iterators for non-(easy)-hashable types #213

Closed ChemistAion closed 3 months ago

ChemistAion commented 3 months ago

I was looking (also in motis project) for a simple container (std::list alike) with stable iterators (for element insertion/erasure). Is there something ready to take off, lying around?

felixguendling commented 3 months ago

Lately, I am using (unsigned) integer indices everywhere (specifically type safe indices based on cista::strong<T, tag>) like here. If that's not an option, we use cista::vector<cista::unique_ptr<T>>. This doesn't give you stable iterators on the vector itself but at least on the elements (what is usually what we need). Since linked lists have bad performance for most of our use cases, there's no implementation in cista yet.

Edit: Regarding the cista::strong<T, tag> indices: they replaced pointers everywhere in our recently written code. So we do not store pointers or iterators anywhere. Only those indices.

ChemistAion commented 3 months ago

@felixguendling Thank you for all the details above :relieved:

I will jump into this on the weekend :construction: ...to replace my dirty PoC with std::list based on your suggestions here. Let me keep this issue open until then.

felixguendling commented 3 months ago

As you were also talking about element erasure: if that's a requirement to free up the space (or at least as good as the allocator you use can really free it) then probably only vector<unique_ptr<T>> helps. Another option (if deletions happen rarely) is to have a cista::bitvec to keep track of which elements have been deleted. So the indices following the deleted element stay the same. I also read an article some time ago where they stored random bits into the index type and stored those random bits together with the data. So if you delete an element, the random bits change and when you want to access with the wrong random bits, you get an exception (so you see "hey! you just wanted to access a deleted element that was reused since then!").

felixguendling commented 3 months ago

If someone wants to contribute something like std::list<T>, feel free to make a PR. However, I would always recommend to try the approaches mentioned above as they tend to give you better performance.

ChemistAion commented 3 months ago

Apologies for not catching up on this yet - I will make an effort to do so during the upcoming May long weekend.

ChemistAion commented 2 months ago

@felixguendling ...almost there with the basic_list PR: https://github.com/ChemistAion/cista/tree/list

felixguendling commented 2 months ago

You can open a PR to check the CI.

It could make sense to add a fuzzing test depending on your use case. The virtual machine approach is quite powerful to find weak points of the implementation. You can test correctness and that the program doesn't crash when deserializing bad data.

ChemistAion commented 2 months ago

Thanks for the hint - yes, I will definitely add fuzzing for it :sparkles: For now, move ctor/operator are missing, but the rest is pretty much complete, with some minor code styling on the way. I will be back with this PoC as a PR during the weekend.