element-hq / element-android

A Matrix collaboration client for Android.
https://element.io/
Apache License 2.0
3.31k stars 696 forks source link

Create a workgroup to analyze and optimize performance, CPU and battery usage of Element-Android #3884

Open MurzNN opened 3 years ago

MurzNN commented 3 years ago

Is your feature request related to a problem? Please describe.

There are a lot of feedback and issues about performance problems in Element-Android: slow interface performance, very high CPU usage (even in background), battery drain, memory usage, and so on.

Because Android didn't provide any easy tool to debug or gather performance statistics for regular users, there is a hard task to provide valuable bug reports related to performance! So regular and even tech-savvy users see that can't normally describe the problem except "it is sometimes very slow and eats my battery", and decide to not submit so useless bug reports, as result developers even didn't see the scale of that problem and thinks that this problem is present only on several individual users.

So, now we have a lot of bug reports and individual user feedback in various chatrooms about slow performance and battery drain of Element-Android app, but most of them are ignored by Element developers, because they don't contain any useful information to debug and analyze.

As result, we now have an Element-Android app that actively eats battery and CPU on devices of many users, but this problem is totally ignored, seems no improvements or even analysis is done on developer side, and there is no hope that this will be changed in future.

And users are very upset not only about Element-Android, but about all Matrix infrastructure, thinking that the whole Matrix is slow and resource hungry, comparing to other modern chat apps like Telegram, Skype, WhatsApp, etc. And this part of people spreads the information to other people that Element and Matrix are too slow.

Describe the solution you'd like.

To solve this problem I see the optimal solution as creating the separate workgroup of developers, that will set aside time to do professional analysis and analytics of the app performance, add metrics and other measurements to app for gathering statistics from real users on their devices, and research the ways to solve detected problems.

Also create a deployment infrastructure, that will automatically run performance tests of each new release and gather statistics how it is changed, to quickly detect new performance problems even before the release.

For example, I (and many other fans of Matrix) am ready to enable any analytics and deepest telemetry on my device, that will gather useful statistics to developers, that can help solve performance problems.

Describe alternatives you've considered.

Alternative is what we having now - a lot of issues and individual feedbacks, ragshakes about app performance and battery drain, which are ignored for a long time without any deeper analysis.

progserega commented 3 years ago

Such problem was described in #2733 Also, may be #536

Images of CPU usage When my phone begin freeze - I was open console and see "top". 1-2 minutes it is show element in top CPU usage with 12-40% of CPU: ![Screenshot_20210824-141004](https://user-images.githubusercontent.com/1297163/130572347-817d3e35-0230-49de-9e83-81a419d06cab.png)
MurzNN commented 3 years ago

For connecting frontend (Android app) telemetry statistic with backend logs (Synapse) we can use the great opensource tool Grafana Tempo, that can gather traces on real user devices with connecting to backend traces, so you can build detailed trace of some background operation in Android app (with adding useful metadata directly to that trace) with including defailed traces of all backend operations on server side, also with included metadata from server and direct links to server log lines.

progserega commented 2 years ago

By the way, there is an idea that the situation is approximately the following:

  1. Let's say you have an account with a lot of rooms.
  2. Each new message (or a bunch of them in a short period of time) generates a push event
  3. Push event, arrives on the phone, google services wake up element-android, which, for each new push, makes sync to the server
  4. If there are a lot of pushes, there are many syncs

There are a couple of nuances:

  1. In the case of a large number of rooms - the number of events will be almost every second
  2. It is expected that if you set the "no notification" mode to a room, then push-and will not come for this room. I tried to transfer all rooms to quiet mode - it did not help. The feeling is that this is an exclusively local option for the android client and the behavior in the context of sync for each push does not change.
  3. There is also a "notification at mention" mode - and here, it would seem, there can be a balance between a quiet mode and a "when needed" notification. Most group rooms are usually set to this mode. But it is unlikely that these keywords are monitored by the server (after all, in the case of an encrypted room, this, in principle, is not possible for the server), and if not, then it must send all the events in the room to the client, and the client, having made sync for each event, will check for keyword in the text of the event and flick if there are words.

As a result, we get some bottleneck, with which it is not clear what to do (to leave all large rooms)? There are of course a couple of options:

  1. If the user switched the room to the "no notifications" mode, then send him only one push. And until the client does sync (and for this room, it will only do it when the user in the client enters it) - do not send push notifications for this room. And the client will highlight the room that there are "some messages" and "forget about her".
  2. Or the second option: the server keeps track of the "timeline" of the client in each room and sends the client a push about events in this room only when the user has read everything in the room. And then the user will get a message (new messages in the room such and such). And the server does not send push notifications for this room anymore. If the user has swiped the message or reads it, it is considered that the "timeline" has been updated and must be notified again in case of new messages. In such a scheme (you can call it "economical" mode - select it in the settings) - if the user has 10 rooms and if the user does not touch the phone, then he will receive a maximum of 10 push notifications per day and the client will make 10 syncs. After that, both the server and the client will wait for a response from the user.
MurzNN commented 2 years ago

https://firefox-source-docs.mozilla.org/devtools/tests/performance-tests-damp.html is a good example how performance monitoring is implemented in Mozilla Firefox, maybe we can produce something similar?