OGRECave / ogre

scene-oriented, flexible 3D engine (C++, Python, C#, Java)
https://ogrecave.github.io/ogre/
MIT License
3.96k stars 974 forks source link

NEON softwareVertexSkinning fails #1193

Closed arkeon7 closed 5 years ago

arkeon7 commented 5 years ago

System Information

Detailled description

The RTSS shader fail to compile when the object have a skeleton with a simple texture on IOS It result as a "broken" mesh split by all faces.

Ogre.log

Fri May 17 17:11:46 201 : Material scheme switch from ShaderGeneratorDefaultScheme to ShaderGeneratorDefaultScheme on material : Material_#25
Material scheme switch from ShaderGeneratorDefaultScheme to ShaderGeneratorDefaultScheme on material : Material_#25

Fri May 17 17:11:46 201 : Vertex Program:55cc4c55994686caee7ecb81e6ce6735_VS Fragment Program:4a08102476ab588eaadffaa8fb240f6d_FS
GLSL link result : 
WARNING: Could not find vertex shader attribute 'blendWeights' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'uv5' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'secondary_colour' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'uv1' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'uv4' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'blendIndices' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'uv6' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'uv2' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'colour' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'position' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'tangent' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'uv3' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'uv7' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'binormal' to match BindAttributeLocation request.
Vertex Program:55cc4c55994686caee7ecb81e6ce6735_VS Fragment Program:4a08102476ab588eaadffaa8fb240f6d_FS
GLSL link result : 
WARNING: Could not find vertex shader attribute 'blendWeights' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'uv5' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'secondary_colour' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'uv1' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'uv4' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'blendIndices' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'uv6' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'uv2' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'colour' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'position' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'tangent' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'uv3' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'uv7' to match BindAttributeLocation request.
WARNING: Could not find vertex shader attribute 'binormal' to match BindAttributeLocation request.

-->

arkeon7 commented 5 years ago

log updated.

paroj commented 5 years ago

these are merely warnings that can be ignored.

Also the GLES2 Rendersystem does currently does not support hardware skinning (via RTSS).

arkeon7 commented 5 years ago

Ok so maybe it's somewhere else :/ Here a screenshot of the result Screen Shot 2019-05-20 at 14 59 32

paroj commented 5 years ago

looks like a precision issue to me. can you try changing this to highp: https://github.com/OGRECave/ogre/blob/master/RenderSystems/GLES2/src/GLSLES/src/OgreGLSLESProgram.cpp#L171

Alternatively it might have something to do with not using separable shader objects any more. Here, you could try commenting in the following: https://github.com/OGRECave/ogre/blob/master/RenderSystems/GLES2/src/OgreGLES2RenderSystem.cpp#L411-L412

arkeon7 commented 5 years ago

I tried both solutions without any success :/ I've uncommented lines 411 & 412

paroj commented 5 years ago

then it might be the NEON acceleration. Try removing this code: https://github.com/OGRECave/ogre/blob/master/OgreMain/src/OgreOptimisedUtil.cpp#L377-L383

arkeon7 commented 5 years ago

Yes that's it ^^ well done! So maybe add a test to avoid this optim on iphone...

paroj commented 5 years ago

for a fast fix yes, but the code works for e.g. the bundled jaiqua mesh and even for the most parts of your mesh. Can you share the mesh you are using? It would be interesting to find out what is going wrong there..

arkeon7 commented 5 years ago

Ok thanks, you can download it here: http://www.arkeon.be/scol/os3d/Tyrannosaurus_Rex.zip

paroj commented 5 years ago

can you test whether this animation is broken on android as well?

arkeon7 commented 5 years ago

No it's working well on Android.

paroj commented 5 years ago

with an 32bit or an 64bit build?

arkeon7 commented 5 years ago

Just test with armeabi-v7a and arm64-v8a both are working well

paroj commented 5 years ago

ok, that saves me some time as I can only test ARM code on android. I guess I will have to just disable NEON on iOS for now then.

arkeon7 commented 5 years ago

Yes thanks.

paroj commented 5 years ago

can you try whether the workaround in #1199 is sufficient?

paroj commented 5 years ago

I could reproduce this on Android as well

paroj commented 5 years ago

nope.. that was caused by not running make clean/ precompiled headers

paroj commented 5 years ago

interestingly the visual artifacts were similar to your screenshot. Did you do a clean build for iOS? Can you set OGRE_ENABLE_PRECOMPILED_HEADERS=OFF?

arkeon7 commented 5 years ago

I'm rebuilding all with the #1199 patch. in the same time the updated QT don't build on OSX... Apple dev happiness ... it will take time.... I need to make a release tomorrow I don't know if I'll have time to make more tests before next week.

arkeon7 commented 5 years ago

Just saw that I have the same problem on Raspberry PI. I will try OGRE_ENABLE_PRECOMPILED_HEADERS=OFF on this

arkeon7 commented 5 years ago

OGRE_ENABLE_PRECOMPILED_HEADERS=OFF do nothing when Ogre is already built. Does it need a full rebuild ?

paroj commented 5 years ago

with clang this option has no effect. Otherwise it need a full rebuild. Ideally do a make clean to remove all generated files as well.

arkeon7 commented 5 years ago

On raspberry pi it still the same after a full rebuild with OGRE_ENABLE_PRECOMPILED_HEADERS=OFF The only thing that work is to disable neon skinning

paroj commented 5 years ago

this needs more investigation. It does not make sense that the same code runs on Android, but not on the pi. Could you run the test.out generated by this project on the pi? https://github.com/paroj/sse2neon

arkeon7 commented 5 years ago

Here the result on the PI3

Running Test MM_SETZERO_SI128
Running Test MM_SETZERO_PS
Running Test MM_SET1_PS
Running Test MM_SET_PS1
Running Test MM_SET_PS
Running Test MM_SET1_EPI32
Running Test MM_SET_EPI32
Running Test MM_STORE_PS
Running Test MM_STOREL_PI
Running Test MM_SHUFFLE_PS
Running Test MM_LOAD1_PS
Running Test MM_LOADL_PI
Running Test MM_ANDNOT_PS
Running Test MM_ANDNOT_SI128
Running Test MM_AND_SI128
Running Test MM_AND_PS
Running Test MM_OR_PS
Running Test MM_OR_SI128
Running Test MM_MOVEMASK_PS
Running Test MM_MOVEMASK_EPI8
Running Test MM_SUB_PS
Running Test MM_SUB_EPI32
Running Test MM_ADD_PS
Running Test MM_ADD_EPI32
Running Test MM_MULLO_EPI16
Running Test MM_MUL_PS
Running Test MM_RCP_PS
Running Test MM_MAX_PS
Running Test MM_MIN_PS
Running Test MM_MIN_EPI16
Running Test MM_MULHI_EPI16
Running Test MM_CMPLT_PS
Running Test MM_CMPGT_PS
Running Test MM_CMPGE_PS
Running Test MM_CMPLE_PS
Running Test MM_CMPEQ_PS
Running Test MM_CMPLT_EPI32
Running Test MM_CMPGT_EPI32
Running Test MM_CVTTPS_EPI32
Running Test MM_CVTEPI32_PS
Running Test MM_CVTPS_EPI32
Running Test MM_CVTSS_F32
Running Test MM_SETR_PS
Running Test MM_STOREU_PS
Running Test MM_STORE_SI128
Running Test MM_STORE_SS
Running Test MM_STOREL_EPI64
Running Test MM_LOAD_PS
Running Test MM_LOADU_PS
Running Test MM_LOAD_SS
Running Test MM_CMPNEQ_PS
Running Test MM_XOR_PS
Running Test MM_XOR_SI128
Running Test MM_SHUFFLE_EPI32_DEFAULT
Running Test MM_SHUFFLE_EPI32_FUNCTION
Running Test MM_SHUFFLE_EPI32_SPLAT
Running Test MM_SHUFFLE_EPI32_SINGLE
Running Test MM_SHUFFLEHI_EPI16_FUNCTION
Running Test MM_ADD_SS
Running Test MM_ADD_EPI16
Running Test MM_MULLO_EPI32
Running Test MM_DIV_PS
Running Test MM_DIV_SS
Running Test MM_SQRT_PS
Running Test MM_SQRT_SS
Running Test MM_RSQRT_PS
Running Test MM_MAX_SS
Running Test MM_MIN_SS
Running Test MM_MAX_EPI32
Running Test MM_MIN_EPI32
Running Test MM_HADD_PS
Running Test MM_CMPORD_PS
Running Test MM_COMILT_SS
**FAILURE** SSE2NEONTest MM_COMILT_SS
Running Test MM_COMIGT_SS
Running Test MM_COMILE_SS
**FAILURE** SSE2NEONTest MM_COMILE_SS
Running Test MM_COMIGE_SS
Running Test MM_COMIEQ_SS
**FAILURE** SSE2NEONTest MM_COMIEQ_SS
Running Test MM_COMINEQ_SS
**FAILURE** SSE2NEONTest MM_COMINEQ_SS
Running Test MM_CVTSI128_SI32
Running Test MM_CVTSI32_SI128
Running Test MM_CASTPS_SI128
Running Test MM_CASTSI128_PS
Running Test MM_LOAD_SI128
Running Test MM_PACKS_EPI16
Running Test MM_PACKUS_EPI16
Running Test MM_PACKS_EPI32
Running Test MM_UNPACKLO_EPI8
Running Test MM_UNPACKLO_EPI16
Running Test MM_UNPACKLO_EPI32
Running Test MM_UNPACKLO_PS
Running Test MM_UNPACKHI_PS
Running Test MM_UNPACKHI_EPI8
Running Test MM_UNPACKHI_EPI16
Running Test MM_UNPACKHI_EPI32
Running Test MM_SFENCE
Running Test MM_STREAM_SI128
Running Test MM_CLFLUSH
Running Test MM_SET1_EPI16
Running Test MM_SET_EPI16
Running Test MM_SLLI_EPI16
Running Test MM_SRLI_EPI16
Running Test MM_CMPEQ_EPI16
Running Test MM_SET1_EPI8
Running Test MM_ADDS_EPU8
Running Test MM_SUBS_EPU8
Running Test MM_MAX_EPU8
Running Test MM_CMPEQ_EPI8
Running Test MM_ADDS_EPI16
Running Test MM_MAX_EPI16
Running Test MM_SUBS_EPU16
Running Test MM_CMPGT_EPI16
Running Test MM_LOADU_SI128
Running Test MM_STOREU_SI128
Running Test MM_ADD_EPI8
Running Test MM_CMPGT_EPI8
Running Test MM_CMPLT_EPI8
Running Test MM_SUB_EPI8
Running Test MM_SETR_EPI32
Running Test MM_MIN_EPU8
SSE2NEONTest Complete: Passed 115 tests : Failed 4
paroj commented 5 years ago

the failing instructions are not used by Ogre, so it should run properly..

arkeon7 commented 5 years ago

Is there a way to know if ogre have been compiled with NEON on Android ?

paroj commented 5 years ago

you can put an #error directive inside an #if __OGRE_HAVE_NEON branch

arkeon7 commented 5 years ago

Well it seems my settings do not enable __OGRE_HAVE_NEON on android

paroj commented 5 years ago

ok, I could finally reproduce this with NEON on Android as well.

paroj commented 5 years ago

caused by bug in _mm_movehl_ps, will be fixed by updating sse2neon

arkeon7 commented 5 years ago

great well done!