dangmoody / HLML

Auto-generated maths library for C and C++ based on HLSL/Cg
24 stars 1 forks source link

Swizzling with .wzxy() function call syntax not compatible with HLSL #59

Closed fstrugar closed 1 year ago

fstrugar commented 2 years ago

One of the main issues trying to compile a couple of generic HLSL functions with HLML was swizzling. https://github.com/redorav/hlslpp has a templated solution to this and it looks like a perfect thing for autogeneration: https://github.com/redorav/hlslpp/tree/master/include/swizzle :)

dangmoody commented 2 years ago

Oh I see, you're trying to use HLML as a solution for writing both C++ and HLSL code. I never thought of that.

Do you think it would be possible to generate swizzle formats that weren't templated? The only potential problem I have with going and implementing this is that I'd be auto-generating templated code, which would sort of go against one of the reasons I wrote this library to begin with.

What do you think?

fstrugar commented 2 years ago

I'm sure it could be done without templates! I don't have good ideas yet though :D I don't quite understand the hlslpp approach but what I've seen looks like a perfect match for code generation, at least for C++ version. I'll look into it when I have a bit of time and get back to you!

Oh I see, you're trying to use HLML as a solution for writing both C++ and HLSL code. I never thought of that.

Yeah, I'm finding it super-useful! Here's some use cases:

dangmoody commented 2 years ago

I had a look at this over my lunch break and I think you're right and I think it might just be possible to do this without templates.

I'll take a look. =)

dangmoody commented 2 years ago

I've managed to come up with something in a testbed that works via macros. It's quite rough around the edges and it looks a little ugly but it does work. =)

I'll see if it can be refined and put into a release. I think the most difficult part about this will be getting Clang, GCC, and MSVC to accept one solution (Clang and GCC are throwing up a lot of warnings about declaring anonymous structs within an anonymous union, for instance, but I might able to get past those with some good old inline #pragma directives).

I work full time and I only get to work on this in my spare time after work in the evenings so it might take a hot minute to get this out, but I think it can be done and I'm excited to see this now. =)

fstrugar commented 2 years ago

Oh wow, nice, can't wait to see it & try it out! Yeah, I bet it's not trivial to get it compiling everywhere. I hope macros don't clash with something in a large codebase.

Thanks for doing this! I'm in a similar situation where I'm using it for a personal project, which I don't get lots of time to spend on due to full time work & family so I totally get it :)

Hey, just remembered, would you consider adding an optional support for "namespace hlml {}" or similar? I know in README you mention that's exactly what you want to avoid :D but it's useful if it's optional (I had a clash with another library defining their float2 but thankfully I don't have to include both from the same files).

dangmoody commented 2 years ago

vector_swizzle_test.zip

So I've attached a basic (and VERY incomplete) swizzle test project that contains 2 different ways of implementing this: one by using macros, and one by using templates. Both compile and run. If you download and run this test project then you can toggle the USE_MACRO #define at the top of main.cpp.

There are issues with this though, and they're all quite large IMO.

Apologies if any of what I'm about to go through is stuff you're already really familiar with. I just want to give as thorough a breakdown as I can of the landscape of the problem domain.

Wall of text incoming; apologies in advance!

Issue 1: All of the swizzle solutions seem to run on undefined behaviour?

From what I can tell: Every swizzle solution I've read tonight and last night (that gives the ergonomic effect you're looking for) all run on undefined behaviour to my eyes, which is pretty scary.

All the swizzle solutions that I've seen basically work like so (using float2 as an example here):

template<class VecType, class ScalarType, int X, int Y>
struct vec2_swizzle_t {
    ScalarType v[2];

    VecType operator=( const VecType& vec ) {
        return VecType(
            v[X] = vec.x,
            v[Y] = vec.y
        );
    }

    operator VecType() {
        return VecType( v[X], v[Y] );
    }
};

(Also I'm using the templated solution just for the sake of example here).

Where X and Y denote what index into the swizzle's array you want. So for example, you would define this swizzle like so (inside the vector type's union):

vec2_swizzle_t<float2, float, 1, 0> yx;

This looks fine until you get to the following scenario:

float3 vec = { 10.0f, 20.0f, 30.0f };
float2 vec1 = vec.xz;

The xz swizzle in this case means we're now accessing the member v[2] inside the vec2_swizzle_t but that array is only of size 2, so that looks like undefined behaviour to me. Clang seems to agree and throws a warning if you enable all the warning levels, but I can also just turn that warning off and (on my machine, at least) it then runs fine (but I suppose that's undefined behaviour for you).

I could be wrong, but that's certainly what it looks like to me.

Issue 2: Writing to swizzles with duplicate components

This is explicitly stated as banned in the HLSL spec. and it's not hard to see why. Basically, the following is illegal:

float2 vec = { 10.0f, 20.0f };
vec.xx = float2( 5.0f, 9.0f ); // duplicate component - no!

The same goes for all other operators that do assignment. I think I know of a way around this without bloating the size of vector types. Let me get back to you on that.

Issue 3: Swizzles that are larger than the vector they're taking data from means every vector type becomes 16 bytes big

And the last issue that I found, to top it all off...

So it's totally possible to do the following:

float2 vec = { 10.0f, 20.0f };
float4 vec1 = vec.xxxx;

But to add that swizzle (and others where the swizzle is larger than the size of the vector) would mean populating the vector's union with a type that is bigger than the vector is intended to be. Meaning the size of a float2 (in this case) would become 16 bytes (4 * sizeof( float )). As someone who cares about cache locality, that worries me greatly. The only solution for this that I can see would be to keep those swizzles as return functions like what HLML has currently.

Apologies for the wall of text!

I hate to be the bearer of bad news like this. Let me know what you think. If you have any ideas around this I'd love to hear them. 😅

fstrugar commented 2 years ago

Hi! Thank you so much for doing this - finally had a moment to check it out!

Both macro and template seem nice/similar, can't say I have a preference.

Yeah I see the issues! I'll try to have a go:

Issue 1 - potentially undefined behaviour

Could one do something like this:

#define DEFINE_SWIZZLE_TYPE_3_TO_2( LhsType3, LhsType2, ScalarType, X, Y ) \
    struct LhsType ## _swizzle_3_2_ ## X ## _ ## Y ## _t { \
        ScalarType v[3]; \
\
        inline LhsType2 operator=( const LhsType2 & vec ) { \
            return LhsType2 ( \
                v[X] = vec.x, \
                v[Y] = vec.y \
            ); \
        } \
\
        inline operator LhsType2 () { \
            return LhsType2 ( v[X], v[Y] ); \
        } \
    }

and then used as

        DEFINE_SWIZZLE_TYPE_3_TO_2( float3, float2, float, 0, 0 ) xx;
        DEFINE_SWIZZLE_TYPE_3_TO_2( float3, float2, float, 0, 1 ) xy;
        DEFINE_SWIZZLE_TYPE_3_TO_2( float3, float2, float, 0, 2 ) xz;

so the type size matches all other types in the union? should be fine? unless I'm missing something :)

Issue 2: Writing to swizzles with duplicate components

I've tried a read-only version of macro, for ex:

#define DEFINE_SWIZZLE_TYPE_2_READONLY( LhsType, ScalarType, X, Y ) \
    struct LhsType ## _swizzle_ ## X ## _ ## Y ## _t { \
        ScalarType v[2]; \
\
        operator LhsType () { \
            return LhsType ( v[X], v[Y] ); \
        } \
    }

and then use that for duplicate components:

        DEFINE_SWIZZLE_TYPE_2_READONLY( float2, float, 0, 0 ) xx;
        DEFINE_SWIZZLE_TYPE_2( float2, float, 0, 1 ) xy;
        DEFINE_SWIZZLE_TYPE_2( float2, float, 1, 0 ) yx;
        DEFINE_SWIZZLE_TYPE_2_READONLY( float2, float, 1, 1 ) yy;

seems to work!

Issue 3: Swizzles that are larger than the vector they're taking data from means every vector type becomes 16 bytes big

Yeah, making every type 16 byte long would be bad - that's the only reason I gave up on hlslpp (they have this constraint although for a different reason - see https://github.com/redorav/hlslpp/issues/58) :)

It's not just cache locality & memory size but it breaks C++<->HLSL interop because the memory packing becomes different.

I wonder if a variation on proposed "3_TO_2" in the other direction (3_TO_4)? They'd be read-only by definition since they always have duplicates. Might work?

Also possibly useful would be a float1-like type (supported in HLSL) to allow for float1(1).xxxx?

Anyhow, hope this makes sense and apologies if it doesn't - it's rather late and my brain is fried :)

dangmoody commented 2 years ago

Hey, thanks for the thoughts. There are some great ideas here! =D

Another wall of text incoming I'm afraid, but it's mostly good news:

TLDR

You've basically solved all of the problems that I found, now it's just a question of whether we want to represent this via macros or templates because I think the macro implementation is more difficult to read in its current form (updated sample project attached below). I'll wait to hear your thoughts before I go adding this to the library for the next release.

Addressing the undefined behaviour problem

Oh yeah, that would do it. I feel a little stupid I didn't see that before, now. 😅

Duplicate swizzle components

Yes! Well then, that solves that problem! I can make the auto-generator do checks as to whether I want a writeable vs. non-writeable swizzle struct based on the permutation, so I don't think that will be an issue.

Returning swizzles larger than the vector they're coming from

So you actually solved that problem too!

I'm showing all of the examples in templates but I've got the macro equivalents written in the attached, updated sample project below:

template<class ReturnType, class ScalarType, int X, int Y, int Z>
struct swizzle_2_to_3_t {
    ScalarType v[2];

    inline operator ReturnType () {
        return ReturnType ( v[X], v[Y], v[Z] );
    }
};

// then use in the union like so:
// were just limiting which indices we use - simple!
swizzle_2_to_3_t<float3, float, 0, 0, 0> xxx;
swizzle_2_to_3_t<float3, float, 0, 0, 1> xxy;
swizzle_2_to_3_t<float3, float, 0, 1, 0> xyx;
swizzle_2_to_3_t<float3, float, 0, 1, 1> xyy;
swizzle_2_to_3_t<float3, float, 1, 0, 0> yxx;
swizzle_2_to_3_t<float3, float, 1, 0, 1> yxy;
swizzle_2_to_3_t<float3, float, 1, 1, 0> yyx;
swizzle_2_to_3_t<float3, float, 1, 1, 1> yyy;

// same goes for 2-4 and 3-4...

I've attached the updated version of the sample project showing all this off if you want to take a look (I've not put much effort into correctly making sure that each type is writeable or non-writeable for the example project because I can do checks in the auto-generator to determine which one I want based on the swizzle permutation anyway):

swizzle_fixed.zip

There is a problem with the macro implementation in that I have to write a lot more boilerplate to avoid circular-dependency issues (float2 can return a float3 swizzle, but float3 can also return a float2 swizzle and so on) and the templated solution actually just avoids all of that, which is quite nice.

To be honest I think the templated implementation is actually nicer to read through (as things currently stand). I know the point of this library is to get rid of horrible, unreadable templated code but I think I prefer it to the macro'd alternative for swizzling (given what the macro implementation has become), but if we decide that the macro version is actually better for whatever reason then I'll roll with that.

If you can find any ways to tidy up the macro implementation, I'd greatly appreciate that!

Another thought I had was that if we can get the macro implementation neat enough it's possible I could do swizzles in the C99 API too, which would be a big win!

Conclusion

The good news though is that works with no undefined behaviour and all vector types retain their size so I'd say the problem of actually generating correct swizzles is solved! It's a just question of how we want to represent that in code now (macros vs templates).

I'll wait to hear your thoughts and then I'll get to work on adding this to the library. 😀

Bear in mind that I'm also going to want to generate tests for all of the possible permutations, so this will take a little while to get in, but it will get done and I'll work as quick as I can to get the release out for it.

But anyway hopefully this ends up being useful for you in your project; I certainly hope it will!

dangmoody commented 2 years ago

Wait, what am I thinking? I'll just make the auto-generator generate the code that the macros generating in that example. Duh!

Ok, I'll get to work. =)

fstrugar commented 2 years ago

Awesome, looks like the solution is crystallizing :D

now it's just a question of whether we want to represent this via macros or templates because I think the macro implementation is more difficult to read in its current form (updated sample project attached below)

yeah I looked at the both and you're right, macro gets harder to read for sure, templates look better!

Wait, what am I thinking? I'll just make the auto-generator generate the code that the macros generating in that example. Duh!

Ha! Yes :))) Will that avoid the circular-dependency issues? I wonder what happens in .hlsl if you do a.xzy.zyx - I'll try it out tomorrow when I've got access to the compiler (although can't think of an useful reason to do that)!

dangmoody commented 2 years ago

Ok, so I've currently got HLML generating two versions of the updated swizzles (both are attached below). One is swizzles as templates and the other is swizzles as just straight-up generated code. I think now that there aren't any macros to read through the non-templated version looks better, personally. 😉

They both take roughly the same amount of time to compile in the tests and I didn't notice any glaring differences in runtime performance between the two versions either (though that testing was only done by looking at how long the generated tests took to run for each, so let's call that a superficial microbenchmark).

Let me know what you think!

hlml_new_swizzles_NO_templates.zip hlml_new_swizzles_templates.zip

This was done in a semi-rush to get this out to you for you so you could take a look so I'll want a little longer to get some more tests in (writing to the writable swizzles, for instance), but otherwise I'm thinking this is ready for release. 😄

fstrugar commented 2 years ago

Hi! I tried out both on my small test case and it seems to work good. I like the templated version more because there's a lot fewer files and it's more readable and if there's no downside on compile times then why not :)

Once I have a bit more time I'll expand code coverage to other shaders, I'm sure a lot of stuff will show up that I can report! At the moment I have a local hack for https://github.com/dangmoody/HLML/issues/58 since it seems to happen everywhere in my code - let me know if I can help anyhow :)

dangmoody commented 2 years ago

I really appreciate all the testing and exploration you're doing. Thank you very much for that. =)

I'm working on #58 right now.

Also, I noticed a comment you made earlier in this thread about an optional namespace. I'm going to open up the Discussions section of this repo so I can put some thoughts down about how and why I've made certain decisions about the library (namespaces being one of those things).

fstrugar commented 2 years ago

You're most welcome, I find the library really useful - thank you for creating it :)

Oh supercool, please do! I've actually tried just doing namespace around the #include but it breaks things with external includes but haven't spent any more time on it.

dangmoody commented 2 years ago

Got the first draft of this submitted just now. I still need to add the write tests for the swizzles. I'll do that tomorrow.

Compile times have increased a bit when including every single swizzle format (like the tests are). GCC and MSVC are the worst offenders, but Clang still went up as well. I'll look into that before submitting but if you are able to look into that yourself I'd love to know what's going on there. I'm not comfortable shipping this with compile times being what they are.

It's not as noticeable when only using one or two of swizzles, so if it really is just down to the fact that the compiler is bringing in a shit ton of symbols then I suppose there's nothing I can do about that?

fstrugar commented 2 years ago

Oh uh how bad are the compile times? I can test before and after b04c6dbd727e9b6baff40710bbd3d9f084deb4ab and let you know (it's just going to take a bit of time - probably next weekend). Could the whole swizzle thing be made optional / under #ifdefs so people who don't need it don't have to pay for longer compile times? :)

dangmoody commented 2 years ago

Any investigating you are able to do would be greatly appreciated. I appreciate you're doing this in your free time. =)

The compile times on Clang wasn't affected too badly, but MSVC and GCC compile times just tanked.

I think the #define option could be the way to go if we're not able to reduce it anymore from what we've got atm (but I think we can!)

fstrugar commented 2 years ago

I'll try it out as soon as I can! I know very little about optimizing compile times though :D

dangmoody commented 2 years ago

Hey there, I've made the discussion about namespaces here with my thoughts put down. If you want to give your thoughts/arguments/counter-arguments then please do. =)

fstrugar commented 1 year ago

Hi! Both non-template and template versions work identically and my project compiles and links in 22-23 seconds on both with only the main file that uses hlml changed and clean rebuild is ~65s with both - can't measure any difference; I guess I'm not using enough of HLML to stress the compile times and the rest of the project is too large.

I vote for the template version, mainly because it has fewer files so source control management is easier, but non-template one is just fine too! :)

dangmoody commented 1 year ago

Ok, thanks for profiling this.

Are these new compile times you are seeing an increase from before?

fstrugar commented 1 year ago

Ok, so it's difficult to compare to before swizzling changes because swizzling allowed me to compile a lot of math code to work on .cpp however I compared to before any hlml integration and it's a fairly noticeable hit: whole project rebuild goes from ~50s to ~65s. I'm ok with that because I need it but... I totally understand that it might be a deal breaker for some people. I don't know why it's that bad :( Perhaps make swizzling optional? I don't have any good ideas.

fstrugar commented 1 year ago

I spent some time removing all swizzle uses and got back to around 55-58s for the compile (vs ~65 with swizzles and 50s with no HLML) but this did remove some of the code files too (there were too many swizzle uses in some places, just couldn't refactor everything). I've also tried partially removing swizzle to just what I need, and that might have helped a bit.

Here's a few ideas:

a.) make swizzle for .rgba (on top of .xyzw) optional - cuts swizzling code in half (in fact, one could make the whole .rgba support optional... :) ) b.) make it optional per-type: I don't think I've ever used swizzle on bools for ex. c.) just make it optional in general? "#ifdef ENABLE_UNION_SWIZZLES"

Also I just noticed this library: https://github.com/gwiazdorrr/CxxSwizzle and they seem to have slightly different swizzle implementation, use example: https://github.com/gwiazdorrr/CxxSwizzle/blob/master/include/swizzle/detail/vector_storage.hpp#L94 ...but it's too late at night and my brain's too fried to understand what's going on there and whether it's in any way different :)

dangmoody commented 1 year ago

Hi there, apologies it's taken me so long to respond, but my personal life got in the way for a long time. I'm back now though. =)

I've been looking at this a bit deeper and it looks like the only compiler that actually has a problem with this is GCC. Clang and MSVC are fine.

When I initially added the new swizzle code I had to add a compiler flag to the GCC build scripts to allow for bigger .obj file outputs because I was going over the default GCC limit. I think that is what's causing GCC to have such big compile times.

Simply removing the RGBA swizzle formats didn't make a difference on GCC, but I still have the compiler flag on. Perhaps this is also to do with my build setup for the tests. I'm using unity builds so the final output intermediate binaries are going to be big. I wonder if moving over to smaller source files makes it easier for the compiler. I imagine I'd be trading compile time for link time though.

I'll investigate and get back to you.

Once again, apologies for the delay, but I'm back now and actively working on this again.

fstrugar commented 1 year ago

Hi there! Sorry to hear that and hope all is good now :)

Oh interesting - the compile flag, who would've guessed! Maybe they use a fallback path for large .obj files that is simply slow by design?

All the best and thanks for looking into this, much appreciated!

dangmoody commented 1 year ago

Hi there,

I've been looking into this for a while now and it doesn't matter what I try to get these GCC build times down, it would appear that GCC is just slow. Compile times on it are bad no matter what I do.

I'm going to make a new release very soon with the new swizzles and I'll be including both XYZW and RGBA for all vector types (including bools) since on other compilers it's not slow to compile. This would appear to just be a GCC thing.

I'll start thinking of ways to elegantly let people only include bits of the API that they need so that if they're using HLML they can reduce the compile times on their own projects if they need to.

Sorry this issue has been active for so long, but man, what a journey!

Expect the new swizzles to come out in v2.2.0. I'll close this out when that gets released. 😄

dangmoody commented 1 year ago

Fixed in v2.2.0.

fstrugar commented 1 year ago

Awesome, thank you for the update and the great project - I haven't been doing much (any) coding outside of work but I was just thinking about getting back to it so it's perfect timing to try out the 2.2.0 - I'll let you know if I spot anything useful :)

dangmoody commented 1 year ago

You are most welcome! I'm happy my software is actually helping someone make something useful! Makes me very happy!

And yes! Please do give any and all feedback you have. I can't promise I'll definitely act on it, but I will definitely listen to and take on board any all feedback/criticisms that people give me and definitely not just dismiss it straight away.

❤️