effekseer / Effekseer

MIT License
1.41k stars 219 forks source link

performance bottlenecks in iOS and android #110

Open huachangmiao opened 6 years ago

huachangmiao commented 6 years ago

Is there a better way to optimize?

durswd commented 6 years ago

-Program I have four ideas.

SIMD. Some parts can be improved with NEON Multicore. parallel computing with multi thread. It is required to many changes. Reduce a calculation. If parameters are default, skip calculations. GPU. Some calculations moves to GPU from CPU. It is required to many changes.

durswd commented 6 years ago

I saw cocos2d-x form. I think it is strange.

huachangmiao commented 6 years ago

sorry. what does form mean?

durswd commented 6 years ago

Cocos2d-x Forums. I'm sorry for typo

huachangmiao commented 6 years ago

i think so too. the performance increase a lot of when i mark the drawElement function.

huachangmiao commented 6 years ago

maybe something wrong in rendering

durswd commented 6 years ago

I think the effects call 4 or 5 times drawElement by a frame. Otherwise, it is bug in Effekseer for cocos2d-x or Effekseer1.4(this is beta version) I'll check it after few days.

durswd commented 6 years ago

I'm checking how many times drawElement is called by a frame at first. drawElement is called 3 or 5 times and vertices is not many. Is it same on your environment?

I think that bottle neck is a sending time of vertecies data between gpu and cpu.

Next I check it on smartphone and find bottle necks.

Homing_Laser01 3 image

Simple_Distortion 5 image

Sword_Ember is only for PC.

durswd commented 6 years ago

And do you mean that "i mark" is comment out?

huachangmiao commented 6 years ago

yes.

durswd commented 6 years ago

I'm very sorry for my iPhone's connector is something wrong. I changed a lightning cable, but it is not recognized. So, I buy new smartphones. Please wait for few days.

durswd commented 6 years ago

I bought iPhoneSE which has a same spec to iPhone6s. I played homing and distortion. But it maintains 60 fps. Would you please send your project to me?

huachangmiao commented 6 years ago

Add a few more items, how many can maintain fps60?

durswd commented 6 years ago

I try it

durswd commented 6 years ago

I'm sleeping. In japan, it is 1 o'clock. I return a result tommorow

huachangmiao commented 6 years ago

have a good dream

durswd commented 6 years ago

Three homings decreases FPS under 60fps (40~60fps) But I got a good hint for improving. I try to optimize it on this weekend.

huachangmiao commented 6 years ago

Looking forward to your good news.

durswd commented 6 years ago

Good news. I realized to improve performance on iOS. 2 times faster at leaset.

This version has not be tested yet on Android, so this version is on branch optimized

https://github.com/effekseer/EffekseerForCocos2d-x/tree/optimized

And you need to change as follows.

manager = efk::EffectManager::create(rsize, 8000);

8000 is the maximum number of sprites generated in an application.

huachangmiao commented 6 years ago

How about the performance of the unity platform in iOS? There are more than 100 roles in my game's battle scene. I think play 10 homings maintains 60 fps is OK. now still 3. T _ T

durswd commented 6 years ago

This is my personal opinion.

Effekseer is optimizing and update a few days after. But even if Effekseer for Unity is updated, I think 4 or 5 is maximum homings maintaining 60 fps.

It is not opinion for effects. Unity is difficult to maintain 60 fps than cocos2d-x because C# and GC (C++ is very fast)

I think that to realize 100 roles and rich effects at the same time is very difficult on current smartphone. To reduce effect sprites and contrive roles is required.

huachangmiao commented 6 years ago

Still thank you for everything. This is my game. Already released. http://yxwd.qq.com In this time, we want to make a 3D game. Let's optimize together.

durswd commented 6 years ago

Looks very good. I will help you.

To optimize it, I have any ideas.

  1. Homing is not light effect. Homing is a sample for PC. So it is not optimized. Heavier than it looks.

We are implementing Effekseer 1.4, which is multi platform version and going to show draw call on the editor. It can easy to optimize.

  1. Culling Effekseer has a culling system to hide effects on out of view. But it is not used current Effekseer for cocos2d-x. If this function is enabled, effects on the view is only shown.

3, Multithreading I need to some time to implement it.

  1. OpenGL ES3.0 If We can use opengl ES 3.0, It may improve a performance.

I have questions.

  1. Is your 3D game TPS, FPS or isometric?

  2. Many large effects like Homing is shown at the same time? Or 2 or 3 large effects and many hit effects?

If you cannot show an information on a public, please send a mail.

huachangmiao commented 6 years ago

just like http://yxwd.qq.com this game. but 3D scene, 3D roles, 3D effect and more roles.

huachangmiao commented 6 years ago

When does the 1.4 version be released? There are still many users using android by opengl 2.0. so I have to use the opengl2.0.

durswd commented 6 years ago

Thank you. I'll play your game and check it.

I plan to release 1.4 beta in early May on Github.

huachangmiao commented 6 years ago

thx. I will follow it.

huachangmiao commented 6 years ago

It's can play 50 homings maintaining 60 fps. You just need build the iOS project by release mode. orz. that's great.

huachangmiao commented 6 years ago

When the number of vertices exceeds a certain value, the display will be wrong. maybe vertices > 65535. You can try play 50 Laser03.efk. 8000 is the maximum number of sprites generated in an application. may I set this param more than 8000?

durswd commented 6 years ago
  1. performance I try it (I checked on release mode. But i check it up to 6, because a display is filled.)

  2. vertices > 65535 Because vertex id is managed as short. I need to sprite rendering.

You can set this param more than 8000. But if this parameter is larger than 65535 / 6(the number of index on a sprite), rendering may be something wrong or invalid.

Thank you your information, I can fix these bugs.

durswd commented 6 years ago

@huachangmiao

Current progress.

Triple buffering

I implemented triple buffering with glMapBuffer. https://github.com/effekseer/EffekseerForCocos2d-x/tree/optimized

it perhaps optimized on some smart phones.

OpenGL ES3 version

I edited cocos2d-x and support opengl ES3 only for android (es2 on other platforms)

https://github.com/durswd/cocos2d-x-es3

If you use this cocos2d-x and edit this line,

https://github.com/effekseer/EffekseerForCocos2d-x/blob/optimized/Players/Cocos2d-x_v3/EffekseerRendererNative.h#L6

a rendering is optimized as same as iOS (with glMapBufferRange)

Unfortunately, I found that huawei's smatphone 's glMapBufferRange is slow.

https://stackoverflow.com/questions/35174046/glmapbufferrange-is-slow-and-memcpy-of-the-mapped-data-is-also-slow-on-andro

Next

I try to reduce the number of calling of glMapBuffer, glMapBufferRange, glBufferSubdata. Would you wait a result for a few days?

I think this result is the maximum performance of particle effects on the smartphones. If this result is not enough for your game, you need to use GPU particles.

huachangmiao commented 6 years ago

thank you. I will try and test it on android. I'm willing to wait. : )

durswd commented 6 years ago

I updated optimized branch. Because of my mistake, Neon is not enable on Android.

Please edit Application.mk to make neon enable,

APP_CPPFLAGS := -frtti -DCC_ENABLE_CHIPMUNK_INTEGRATION=1 -std=c++11 -fsigned-char -mfpu=neon -ftree-vectorize -march=armv7-a -mfloat-abi=softfp

APP_ABI := armeabi-v7a

It will optimize a performance.

I continue to optimize a performance.

huachangmiao commented 6 years ago

Got it. Thx.

I implemented a new feature in my project. Adjust resolution in real time to improve rendering performance.

0016

So I have a new idea. image

Can the Track and the Ribbon set the count of vertices? just like the Ring.

durswd commented 6 years ago

I think it can implement easily. I try to implement it. Please wait few days.

huachangmiao commented 6 years ago

I found the conditions for using NEON.

+ifeq ($(TARGET_ARCH_ABI),armeabi-v7a) +MATHNEONFILE := math/MathUtil.cpp.neon +else +MATHNEONFILE := math/MathUtil.cpp +endif + https://github.com/cocos2d/cocos2d-x/pull/10157

Should I change thie follow lines? ../external/effekseer/Effekseer.cpp \ ../external/effekseer/EffekseerNative.cpp \ ../external/effekseer/EffekseerRendererNative.cpp \ to ../external/effekseer/Effekseer.cpp.neon \ ../external/effekseer/EffekseerNative.cpp.neon \ ../external/effekseer/EffekseerRendererNative.cpp.neon \

And there is a possible problem here.

Crash on Android devices with arm-v7a, but without NEON #9968 https://github.com/cocos2d/cocos2d-x/issues/9968

https://github.com/cocos2d/cocos2d-x/pull/16274

durswd commented 6 years ago

I'm sorry. I don't know about it. At least, in my case, I edited Application.mk.

I try to read these issues.

durswd commented 6 years ago

I implemeted the count of vertices. This is test version. https://github.com/effekseer/Effekseer/releases/tag/PruneDev

Is it enough for your games or I need to optimize more? I can optimize it. But it need more time.

huachangmiao commented 6 years ago

It 's enough for my game. thx.

durswd commented 6 years ago

OK. But the function of the count of vertices is not cool. I try to optimize for 1.4

huachangmiao commented 6 years ago

ok. thank you.

huachangmiao commented 6 years ago

I almost completed the battle of my project. I test the performance in some devices.

iPhone6/iPhone6s/iPhone7/iPhone8/iPhoneX : Perfect performance. huawei CUN-AL00(GPU none): acceptable huawei 荣耀V10(GPU mali): very bad

Very strange. 荣耀V10's hardware is higher than CUN-AL00, but performance is worse.

Is this article helpful? https://community.arm.com/graphics/f/discussions/6657/how-to-gain-performance-through-pbo-pixel-buffer-object-on-mali-t-880

durswd commented 6 years ago

I feel very strange too I read it.

I have questions.

  1. How many is it FPS on CUN-AL00 and 荣耀V10?
  2. 荣耀V10's display resolution is 4 times larger than CUN-AL00. Do you decrease a frame buffer's resolution?
  3. Do you use maximum vertex count?

I trying to parallel computing now. I hope this optimization help you.

huachangmiao commented 6 years ago
  1. I locked the FPS to 30/s. iOS: 30/s. CUN-AL00 20-25/s. 荣耀V10 12/s.

  2. I draw all objects in a 1136 * 640 renderTexture. then draw the renderTexture scale to device display resolution.

  3. I set the maximum vertex count to 8000.

by the way. I just play 2-3 effects at the same time.

durswd commented 6 years ago

Can you send me this effect?

huachangmiao commented 6 years ago

skills.zip

durswd commented 6 years ago

Thank you. I will check it. I'm sleeping.

huachangmiao commented 6 years ago

Thank you. ; )

durswd commented 6 years ago

I will try to use it. https://developer.arm.com/products/software-development-tools/graphics-development-tools/mali-graphics-debugger

huachangmiao commented 6 years ago

I close the mapbuffer and bufferrange. It‘s also very slow in huawei honor v10.