OutpostUniverse / OP2Utility

C++ library for working with Outpost 2 related files and tasks.
MIT License
4 stars 0 forks source link

Stream Read/Write helpers for complex types #68

Closed DanRStevens closed 6 years ago

DanRStevens commented 6 years ago

It is desirable to be able to save complex data using a compact notation such as:

stream.Write(someComplexValue);

However, there is no safe and general way to do this for any data type. In particular, an object with a virtual function table pointer should not be saved or loaded this way. Such saved/loaded objects don't work with relocatable code, and could be a big security concern otherwise. In general, it's of questionable use for data that may contain pointers, or other resource handles. That may be through explicit use of pointers, or implicit use of pointers such as by virtual functions and virtual function tables.

It is also questionable to use a generic method for data types of non-fixed size, as that creates compatibility issues between code compiled for 32-bit and for 64-bit. It may be permissible to allow non-fixed sized types, but perhaps result in a warning.

For simple fixed sized data types, such as int8_t, int16_t, int32_t, int64_t, and the corresponding unsigned types, it should be fine. Similarly, combining elements into a struct or array should also be fine.

The standard references things like Aggregate types, Standard Layout types, and POD types. I think we want something similar, though the exact definition may vary from anything the standard specifies.

We might be able to piggyback off some of the type traits defined by the standard. For example: is_aggregate, is_standard_layout, is_pod, is_trivial, is_scalar, is_pointer. These can be combined with enable_if.

This is getting to the limits of my knowledge of templates, but I believe I've seen code that variously used either using or enable_if to limit template instantiation to safe types. That should prevent unexpected behaviour from undesired or unintended template expansion.


The other side of this is to enable template expansion for more complicated types, but with an alternate definition that has been specialized for the particular type. This could allow easy default behaviour with types such as vector, which otherwise would not result in a safe template expansion with a more generic template. Hence you can do the obvious data dump with:

stream.Write(vectorVariable);

I'm not certain, but the order of template definitions between a generic template and a specialization may be important.

Brett208 commented 6 years ago

I think this is a good idea to explore. It would make for a better experience to receive a compiler warning when attempting to abuse the template instead of dealing with it at runtime. I've already tried feeding it std::vector.data() just to see what it would do.

It looks like is_pod is being deprecated in C++20, so probably worth not including that one.

It is also beyond my current knowledge of the language.

I also feel like the current stream code serves our purposes well. This seems more appropriate if we are looking to polish this and make it more useable outside of OP2Utility?


Also, I verified, you can read a std::array without the size call using our current infrastructure.

As for a std::string template, I think the default use case should be not including the null terminator. This fits more with std::string naturally as std::string.size soes not include the final null terminator in its size. Assuming we want to pursue a default template here. It might be better to leave up to the individual to manually write.

DanRStevens commented 6 years ago

The danger is if you pass vector variables directly, without calling .data():

std::vector<int> v = { 3, 7, 12, 17, 5, 21, 2 };
streamWriter.Write(v);

Currently this will compile without warning or error. It will write data to the file, but not the data you expect. Consider that v contains a pointer to the buffer data rather than the actual buffer data. It also contains fields for the used and reserved size of that buffer. Now consider what sizeof(v) returns. It's not related to the buffer at all. That sort of accidental use is what this would protect against.

You're right though, this is more about polish for use outside of OP2Utility. I can see that happening, though not until after the current work is complete.

This is also a little outside of my current knowledge. I haven't really done a whole lot of template metaprogramming.

Interesting note about is_pod being deprecated. I didn't know that. I kind of wonder why now, as it doesn't seem to be explained anywhere.

Brett208 commented 6 years ago

Some opinions about the deprecation here: https://stackoverflow.com/questions/48225673/why-is-stdis-pod-deprecated-in-c20

I just noticed the deprecaption note on a reference site when clicking through links.

DanRStevens commented 6 years ago

Thanks for the link. That was interesting. It also validates some sneaking suspicions I'd had for years.

This is also why I listed a bunch of those type traits. So we could learn something about them and see which ones were relevant, or worth basing code on. Probably not is_pod by the looks of it. It seems to suggest is_trivial and is_standard_layout may be of greater interest. I'll have to dig into the details at some point.

Brett208 commented 6 years ago

@DanRStevens, when you posted about static_assert in the code review, while I was reading on the subject it occured to me we could possibly use static_assert here to check for incorrect structures when using the Read/Write templates. https://en.cppreference.com/w/cpp/language/static_assert

template <class T>
void swap(T& a, T& b)
{
    static_assert(std::is_copy_constructible<T>::value,
                  "Swap requires copying");
    static_assert(std::is_nothrow_copy_constructible<T>::value
               && std::is_nothrow_copy_assignable<T>::value,
                  "Swap may throw");
    auto c = b;
    b = a;
    a = c;
}
DanRStevens commented 6 years ago

Yes, good point. I think we should try to figure out if static_assert or enable_if is more appropriate here.

From what I understand, static_assert provides a clear error message when you try to use a template improperly, though I don't know how well it plays with template specializations. That is, if the generic template throws an error for vector types, can you still create a specialization to handle vectors. Meanwhile enable_if I don't think provides an error message, it just makes the template not really visible or selected there. It almost certainly will not interfere with template specialization. It also works in a slightly more obscure way. We may need to experiment a bit with these.

DanRStevens commented 6 years ago

I came across a couple of helpful articles on std::enable_if that I wanted to list here: SFINAE and enable_if C++11 Usage of std::enable_if in Function Templates

DanRStevens commented 6 years ago

I was going through some old stuff, and wanted to include the suggested code snippet from the closed Pull Request #66 for a possible vector Write method:

template<typename T, typename A> inline void Write(const std::vector<T, A>& vector) {
        WriteImplementation(vector.data(), vector.size() * sizeof(T));
    }

This code had an attached concern that an include <vector> would be required in the Stream library to support this method.

There was also mention of having a method to Write a string. I would suggest such a method not write the null terminator by default. Most strings stored to binary files use the length prefix approach rather than a null terminator, and that's the common case we're typically dealing with. It would be possible to support writing a null terminator using more explicit code.

DanRStevens commented 6 years ago

I was just looking at the documentation for is_trivially_copyable:

Notes Objects of trivially-copyable types are the only C++ objects that may be safely copied with std::memcpy or serialized to/from binary files with std::ofstream::write()/std::ifstream::read(). In general, a trivially copyable type is any type for which the underlying bytes can be copied to an array of char or unsigned char and into a new object of the same type, and the resulting object would have the same value as the original.

This is exactly what we are doing, and what our concerns are.


I don't think static_assert will prevent inappropriate template expansion. I think that's more for error checking after the fact. I think what we want here is enable_if, combined with is_trivially_copyable.