Replaced separate input structs for each function with an SSE 'equivalent' of its respective maths type (for example: there now exists a float4_sse_t which holds an __m128 register for each component, so each register vector/matrix holds 4 of each component).
Added identity_sse() to set register matrices to identity.
Added cross_sse() which performs the cross product of 2 float3_sse_t types or 2 float4_sse_t types.
Upgraded to a (currently) experimental version of Temper which profiles the execution time of each test and prints it to the console.
Separated the scalar tests from their SSE version which therefore (when combined with the above change) shows how long the scalar version took vs the SSE version.
Various other minor stability improvements and bug fixes.
float4_sse_t
which holds an__m128
register for each component, so each register vector/matrix holds 4 of each component).identity_sse()
to set register matrices to identity.cross_sse()
which performs the cross product of 2float3_sse_t
types or 2float4_sse_t
types.