ente-io / ente

Fully open source, End to End Encrypted alternative to Google Photos and Apple Photos
https://ente.io
GNU Affero General Public License v3.0
15.05k stars 776 forks source link

High CPU usage during inference for Magic search #645

Open s38b35M5 opened 9 months ago

s38b35M5 commented 9 months ago

The upload process necessarily is a CPU-heavy task as all files are encrypted before upload.

This should be clearly spelled out in every user-facing doc/how-to/FAQ and perhaps even in the upload procedure in the apps. This would remind the user that they may experience slowdowns (on slower hardware, for instance), and in the case of a Google Takeout upload of many GB, could experience uploads that are CPU-limited and slower than a simple upload of unencrypted data.

vishnukvmd commented 9 months ago

Hey, could you please confirm if the CPU usage is any lesser when you're uploading unzipped folders?

We'd like to understand if it's parsing the ZIP file that's causing the CPU overhead.

Thank you!

s38b35M5 commented 9 months ago

I extracted a single 350MB folder from my next 10GB Takeout file and used the Upload > Folder option, and my CPU is still pegged and MP3 playback of a song I had in the background started stuttering waiting for CPU time. It seems that (at least on my machine) it is not related to the ZIP file, but encrypting, and possibly the indexing (not ML, I have that disabled for now) that goes on at the same time?

Here, I am uploading the 350MB of media files: image

Here, the upload is complete, and indexing is occuring: image

It isn't until Pending equals zero that CPU usage subsides.

Below is my system info, and I can provide ente logs through another channel if needed.


System:    Kernel: 5.10.0-26-amd64 [5.10.197-1] x86_64 bits: 64 compiler: gcc v: 10.2.1 
           parameters: BOOT_IMAGE=/vmlinuz-5.10.0-26-amd64 root=UUID=<filter> ro quiet 
           init=/lib/systemd/systemd 
           Desktop: Xfce 4.18.1 tk: Gtk 3.24.24 info: xfce4-panel wm: xfwm 4.18.0 vt: 7 
           dm: LightDM 1.26.0 Distro: MX-21.3_x64 Wildflower January 15  2023 
           base: Debian GNU/Linux 11 (bullseye) 
Machine:   Type: Laptop System: Hewlett-Packard product: HP EliteBook 840 G2 v: A3008D510B03 
           serial: <filter> Chassis: type: 10 serial: <filter> 
           Mobo: Hewlett-Packard model: 2216 v: KBC Version 96.5B serial: <filter> 
           BIOS: Hewlett-Packard v: M71 Ver. 01.31 date: 02/24/2020 
Battery:   ID-1: BAT0 charge: 0.2 Wh (100.0%) condition:poor (100.0%) volts: 12.6 min: 11.4 
           model: Hewlett-Packard Primary type: Li-ion serial: <filter> status: Unknown 
CPU:       Info: Dual Core model: Intel Core i5-5300U bits: 64 type: MT MCP arch: Broadwell 
           family: 6 model-id: 3D (61) stepping: 4 microcode: 2F cache: L2: 3 MiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 18359 
           Speed: 798 MHz min/max: 500/2900 MHz Core speeds (MHz): 1: 798 2: 798 3: 798 4: 800 
           Vulnerabilities: Type: gather_data_sampling status: Not affected 
           Type: itlb_multihit status: KVM: VMX disabled 
           Type: l1tf mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable 
           Type: mds mitigation: Clear CPU buffers; SMT vulnerable 
           Type: meltdown mitigation: PTI 
           Type: mmio_stale_data status: Unknown: No mitigations 
           Type: retbleed status: Not affected 
           Type: spec_rstack_overflow status: Not affected 
           Type: spec_store_bypass 
           mitigation: Speculative Store Bypass disabled via prctl and seccomp 
           Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer sanitization 
           Type: spectre_v2 mitigation: Retpolines, IBPB: conditional, IBRS_FW, STIBP: 
           conditional, RSB filling, PBRSB-eIBRS: Not affected 
           Type: srbds mitigation: Microcode 
           Type: tsx_async_abort mitigation: Clear CPU buffers; SMT vulnerable 
Graphics:  Device-1: Intel HD Graphics 5500 
           vendor: Hewlett-Packard ZBook 15u G2 Mobile Workstation driver: i915 v: kernel 
           bus-ID: 00:02.0 chip-ID: 8086:1616 class-ID: 0300 
           Device-2: Chicony HP HD Webcam type: USB driver: uvcvideo bus-ID: 2-7:5 
           chip-ID: 04f2:b477 class-ID: 0e02 serial: <filter> 
           Display: x11 server: X.Org 1.20.11 compositor: xfwm4 v: 4.18.0 driver: 
           loaded: modesetting unloaded: fbdev,vesa display-ID: :0.0 screens: 1 
           Screen-1: 0 s-res: 1366x768 s-dpi: 96 s-size: 361x203mm (14.2x8.0") 
           s-diag: 414mm (16.3") 
           Monitor-1: eDP-1 res: 1366x768 hz: 60 dpi: 112 size: 309x174mm (12.2x6.9") 
           diag: 355mm (14") 
           OpenGL: renderer: Mesa Intel HD Graphics 5500 (BDW GT2) v: 4.6 Mesa 20.3.5 
           direct render: Yes 
Audio:     Device-1: Intel Broadwell-U Audio vendor: Hewlett-Packard driver: snd_hda_intel 
           v: kernel bus-ID: 00:03.0 chip-ID: 8086:160c class-ID: 0403 
           Device-2: Intel Wildcat Point-LP High Definition Audio vendor: Hewlett-Packard 
           driver: snd_hda_intel v: kernel bus-ID: 00:1b.0 chip-ID: 8086:9ca0 class-ID: 0403 
           Sound Server-1: ALSA v: k5.10.0-26-amd64 running: yes 
           Sound Server-2: PulseAudio v: 14.2 running: yes 
Network:   Device-1: Intel Ethernet I218-LM vendor: Hewlett-Packard driver: e1000e v: kernel 
           port: 3080 bus-ID: 00:19.0 chip-ID: 8086:15a2 class-ID: 0200 
           IF: eth0 state: up speed: 1000 Mbps duplex: full mac: <filter> 
           Device-2: Intel Wireless 7265 driver: iwlwifi v: kernel modules: wl port: ef80 
           bus-ID: 02:00.0 chip-ID: 8086:095a class-ID: 0280 
           IF: wlan0 state: down mac: <filter> 
Bluetooth: Device-1: Intel Bluetooth wireless interface type: USB driver: btusb v: 0.8 
           bus-ID: 2-4:4 chip-ID: 8087:0a2a class-ID: e001 
           Report: hciconfig ID: hci0 rfk-id: 1 state: up address: <filter> bt-v: 2.1 lmp-v: 4.0 
           sub-v: 1000 hci-v: 4.0 rev: 1000 
           Info: acl-mtu: 1021:5 sco-mtu: 96:6 link-policy: rswitch hold sniff 
           link-mode: slave accept service-classes: rendering, capturing, object transfer, audio 
Drives:    Local Storage: total: 238.47 GiB used: 204.47 GiB (85.7%) 
           SMART Message: Unable to run smartctl. Root privileges required. 
           ID-1: /dev/sda maj-min: 8:0 vendor: Micron model: MTFDDAK256TBN-1AR1ZABHA 
           size: 238.47 GiB block-size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s type: SSD 
           serial: <filter> rev: 0012 scheme: MBR 
Partition: ID-1: / raw-size: 237.46 GiB size: 232.67 GiB (97.98%) used: 204.36 GiB (87.8%) 
           fs: ext4 dev: /dev/dm-0 maj-min: 253:0 mapped: root.fsm 
           ID-2: /boot raw-size: 1024 MiB size: 973.4 MiB (95.06%) used: 105.1 MiB (10.8%) 
           fs: ext4 dev: /dev/sda1 maj-min: 8:1 
Swap:      Alert: No swap data was found. 
Sensors:   System Temperatures: cpu: 48.0 C mobo: 0.0 C 
           Fan Speeds (RPM): N/A 
Repos:     Packages: 2259 note: see --pkg apt: 2218 lib: 1104 flatpak: 41 
           No active apt repos in: /etc/apt/sources.list 
           Active apt repos in: /etc/apt/sources.list.d/brave-browser-release.list 
           1: deb [arch=amd64] https://brave-browser-apt-release.s3.brave.com/ bullseye main
           Active apt repos in: /etc/apt/sources.list.d/debian-stable-updates.list 
           1: deb http://atl.mirrors.clouvider.net/debian bullseye-updates main contrib non-free
           Active apt repos in: /etc/apt/sources.list.d/debian.list 
           1: deb http://deb.debian.org/debian bullseye main contrib non-free
           2: deb http://security.debian.org/debian-security bullseye-security main contrib non-free
           Active apt repos in: /etc/apt/sources.list.d/google-earth-pro.list 
           1: deb [arch=amd64] http://dl.google.com/linux/earth/deb/ stable main
           No active apt repos in: /etc/apt/sources.list.d/librewolf.list 
           Active apt repos in: /etc/apt/sources.list.d/mx.list 
           1: deb http://mirror.cogentco.com/pub/linux/mxlinux/mx/repo/ bullseye main non-free
           Active apt repos in: /etc/apt/sources.list.d/nordvpn.list 
           1: deb https://repo.nordvpn.com/deb/nordvpn/debian stable main
           Active apt repos in: /etc/apt/sources.list.d/signal-xenial-added-by-mxpi.list 
           1: deb [arch=amd64] https://updates.signal.org/desktop/apt xenial main
Info:      Processes: 237 Uptime: 47m wakeups: 1 Memory: 15.04 GiB used: 2.84 GiB (18.9%) 
           Init: systemd v: 247 runlevel: 5 default: 5 tool: systemctl Compilers: gcc: N/A alt: 10 
           Client: shell wrapper v: 5.1.4-release inxi: 3.3.06 
Boot Mode: BIOS (legacy, CSM, MBR)```
vishnukvmd commented 9 months ago

Hey, thanks a bunch for sharing the details!

It looks like the process we're running on device for machine learning is the culprit. We're currently exploring ways to leverage your GPU instead.

I'll update the issue description and keep this open until we've switched the implementation.

s38b35M5 commented 9 months ago

You are most welcome!

Do you mean to say that, even with the ML disabled (or rather, left at the default opted-out setting), ML is still going on?

Perhaps a FR to optionally disable any indexing/learning during the initial upload process, as I suspect my Google Takeout library (125GB) is smaller than most.

vishnukvmd commented 9 months ago

Right, we'll figure out which is the quicker path forward - adding an option to disable / leveraging the GPU.

s38b35M5 commented 9 months ago

Part of my initial issue creation revolved around updating docs to let users know that uploading can be resource heavy for reasons that might not be immediately clear without explanation. Don't want to lose that aspect of this issue.

vishnukvmd commented 9 months ago

Got it. We should be able to figure out a way to reduce the resource utilisation so that a warning isn't necessary 🤞

s38b35M5 commented 9 months ago

I'm sure you will find the best path forward. Thanks for being responsive!

s38b35M5 commented 9 months ago

Now that I am more familiar with the app, I wanted to mention that the status of indexing isn't obvious, and at least on my low-spec laptop, I can't do anything else while Ente desktop is performing the index as my CPU is maxed out. Perhaps a dismiss-able notification explaining that this process is going on so that the user is aware of why CPU usage spikes when the app is open.

Use case: I have uploaded about 40% of my 125GB so far, and for hours after each chunk of data, the app indexes, spiking CPU to the point where my laptop is unusable for most tasks. I only know it's happening if I click Hamburger Menu > Preferences > Advanced and scroll down to see indexed vs pending items.

EDIT: I just realized indexing is only one thing that uses lots of CPU. ML also does, and there is no indicator of that process being in progress, which is slightly odd to have an app monopolizing CPU without any apparent indicator of why.

vishnukvmd commented 9 months ago

Thanks for the feedback, we'll surface information about the CPU intensive processes that are running. This might take us a while, since we are first prioritizing ways to lower the CPU usage, but we will figure this out!