Open Triang3l opened 10 years ago
By the way, maybe implement Fast4VerticesSSE on Linux?
void *pCurrPos = m_pCurrPosition;
__m128 m1, m2, m3;
m1 = _mm_load_ps((float *)vtx_a);
m2 = _mm_load_ps((float *)vtx_a + 4);
m3 = _mm_load_ps((float *)vtx_a + 8);
_mm_stream_ps((float *)pCurrPos, m1);
_mm_stream_ps((float *)pCurrPos + 4, m2);
_mm_stream_ps((float *)pCurrPos + 8, m3);
m1 = _mm_load_ps((float *)vtx_b);
m2 = _mm_load_ps((float *)vtx_b + 4);
m3 = _mm_load_ps((float *)vtx_b + 8);
_mm_stream_ps((float *)pCurrPos + 12, m1);
_mm_stream_ps((float *)pCurrPos + 16, m2);
_mm_stream_ps((float *)pCurrPos + 20, m3);
m1 = _mm_load_ps((float *)vtx_c);
m2 = _mm_load_ps((float *)vtx_c + 4);
m3 = _mm_load_ps((float *)vtx_c + 8);
_mm_stream_ps((float *)pCurrPos + 24, m1);
_mm_stream_ps((float *)pCurrPos + 28, m2);
_mm_stream_ps((float *)pCurrPos + 32, m3);
m1 = _mm_load_ps((float *)vtx_d);
m2 = _mm_load_ps((float *)vtx_d + 4);
m3 = _mm_load_ps((float *)vtx_d + 8);
_mm_stream_ps((float *)pCurrPos + 36, m1);
_mm_stream_ps((float *)pCurrPos + 40, m2);
_mm_stream_ps((float *)pCurrPos + 44, m3);
Oh, well, I guess clobbers are not required because of emms
. But still, needs more research.
In CVertexBuilder functions FastVertex and FastVertexSSE (both have two overloads for DX7 and DX8 meshes) in public/materialsystem/imesh.h, clobbered MMX and SSE are not specified in the list.
Since CStudioRender::R_StudioDrawDynamicMesh doesn't seem to do any other x87, MMX or SSE operations, not listing the clobbered registers doesn't cause anything bad now, but still, it would be better to list them properly to avoid possible issues in the future.