bloomberg / clang-p2996

Experimental clang support for WG21 P2996 (Reflection).
https://github.com/bloomberg/clang-p2996/tree/p2996/P2996.md
46 stars 8 forks source link

Compile-time reflection hashing #72

Closed nilres closed 3 weeks ago

nilres commented 3 weeks ago

With 97ce63b0ee92ebd3e4b9dc9423245eeaf24b94a8 compile time hash generation was introduced which I really liked as an replacement for projects like https://github.com/Manu343726/ctti which provide similar functionality but in a unelegant way. I don't know if the hash implementation is part of any of the current proposals but personaly I would find it quite useful. Maybe you could explain why you removed it a couple of commits later.

Greetings Nils

katzdm commented 3 weeks ago

Hey @nilres , thanks for taking an interest in the project!

As you noticed, we (P2996 authors) were recently considering the introduction of reflection hashing, and from an implementation standpoint it's indeed quite straightforward.

Some necessary context for why we decided against it: For P2996, it will not be possible for a reflection to escape its translation unit (i.e., no module or external linkage for a reflection or anything that can "hold" one). This requirement is necessary because a reflection is ultimately an opaque tagged pointer to some compile-time data structure (usually an AST node of some kind). Such pointers are not expected to be stable between TUs. While we think it will be possible in the future to relax this requirement, it requires research into "reconstructing the reflection" on the other side of a module import. In addition with being too busy polishing P2996 to be ready for '26, such experiments are made more difficult because this experimental implementation (i.e., Clang/P2996) is implemented in such a way that breaks modules (fixing this will require some architectural changes to Clang's constant evaluator).

So we're very interested in reflections not escaping TUs (at least for P2996) - which brings us to hashes. While we can write language rules to confine reflections to their "home TUs", hashing gives you a means of producing an int representation, which can indeed escape the TU. The plan was to therefore confine std::hash<std::meta::info> objects to their TUs in the same manner as reflections, and for the contract to be that it was unspecified whether distinct std::hash<std::meta::info> objects give the same hash - so if you exported hash(^int) from TU1, it might not match hash(^int) in TU2. This seemed potentially bug-prone for end-users, but was something we thought we might be able to live with to make unordered_map<info, T> or unordered_set<info> work..

..until we realized that neither unordered_map nor unordered_set are literal types anyway, so they can't be used at compile-time and are thus all but unusable with std::meta::info 😭 With those off the table, it didn't seem worth it to introduce hash<info> at this time, especially since some further research might yield ways to give better cross-TU stability guarantees in the future. It's definitely a thing to circle back on after P2996 - but the name of the paper is "Reflection for C++26", and we're trying our very best to make good on that :)

Hope that helps clear things up! Please follow up with further questions, or close the issue if I've succeeded in addressing them :)

nilres commented 3 weeks ago

Amazing background information. Wouldn't it be possible to use "name_of" in the same way? The name might not be fully unique making it a bad "hash" but still it is a way to sneek out information across multiple TUs isn't it?

Not saying this is a good argument to introduce an explicit hash but something to keep in mind.

katzdm commented 3 weeks ago

Yep, it would certainly be possible to build a "mostly unique representation" using the other libraries provided - but the author of that representation will be able to ensure that it has the properties needed for their application. For instance, they might only need hash representations of class types having external linkage, in which case it should suffice to hash the name of the class, and do the same walking up parent_of-edges until ^:: is reached. It might be that such functions are better suited for non-Standard library implementations in '26, and that standardizing something might make sense at a later time.