GrapheneOS / os-issue-tracker

Issue tracker for GrapheneOS Android Open Source Project hardening work. Standalone projects like Auditor, AttestationServer and hardened_malloc have their own dedicated trackers.
https://grapheneos.org/
357 stars 21 forks source link

bluetooth audio streaming issue due to memory corruption bug(s) uncovered by hardened_malloc #137

Closed thestinger closed 4 years ago

thestinger commented 5 years ago

Update: this has been narrowed down somewhat and is likely caused by upstream memory corruption bugs uncovered by hardened_malloc.

chuckb7 commented 5 years ago

Pixel 3 is having this issue with QP1A.190711.020.2019.09.23.19 and QP1A.190711.020.C3.2019.09.25.00. Worked fine with this phone and BT device on PQ3A.190801.002.2019.08.25.15 and earlier releases.

Phone shows BT connected. Setting the media output to phone or the BT device plays audio through the phone speakers. The BT controls (car radio buttons) are able to control playback (play/pause, skip, etc). Voice call audio is routed through the BT device.

n1m1 commented 5 years ago

Pixel 3a with QP1A.190711.020.C3.2019.09.25.00.

Initially I had the same exact issue reported by @chuckb7. After a reboot it vanished and now everything is working fine. Also, I tried to switch and route the audio between two different bluetooth devices and I had no problem at all.

If something changes I will report here.

thestinger commented 5 years ago

@chuckb7 Have you rebooted since the upgrade?

xorbital commented 5 years ago

I've had the same exact issue on a Pixel 3a XL after the upgrade, rebooting didn't work. However this worked for now, I'll keep track of it:

  1. Remove all paired devices
  2. Reboot
  3. Re-Pair your devices
thnkg commented 5 years ago

Bluetooth audio works for me using the above workaround on my Pixel 3, however it's using AAC as the codec instead of aptX with my car radio. It was working fine in the latest pie release. Manually setting the codec to aptX in developer options doesn't seem to do anything.

ghost commented 5 years ago

Pixel 3a with QP1A.190711.020.C3.2019.09.25.00

Issue persists after removing paired devices, reboot, and re-pair.

Phone and bluetooth device appear to be connected. BT device gives an audio tone after phone shows connected, however, audio continues to play through the phone, not the BT device.

Additionally, BT device only appears as available if "Show BT devices without names (MAC address only" is toggled on in Developer Options.

chuckb7 commented 5 years ago

A few additional updates for my Pixel 3. I have three BT devices having a variety of issues since the update.

1) Vehicle: Audio through phone, voice call through vehicle. This issue persisted across reboots and enable/disable BT. I decided to try forgetting the phone/car and attempting to re-pair. I am unable to pair the phone and car now. I see part of the mac address for the car, but not the name. I timeout trying to pair to the car and never get asked to confirm the PIN displayed by the car.

2) BT Headset: Audio through phone, voice call through headset. This issue has persisted across reboots or enable/disable BT. I have not done any additional troubleshooting.

3) Stereo adapter. Working perfectly. No issues since the upgrade.

Based on additional experiences over the weekend, it looks like my issues are related to BT devices that can stream audio AND handle voice calls. Those devices have issues directing the audio stream, and I have been unable to re-pair one of the devices to even get voice calls working. My audio only device does not seem to be having any issues.

Based on @xorbital's suggestion, I will try removing ALL BT devices from the phone. So far I have only removed one device.

As a separate issue (I know it probably shouldn't be tracked here), my screen goes black whenever I'm on a voice call. This makes it impossible to use the phone for anything that requires entering touch tones...

chuckb7 commented 5 years ago

No luck with any of the workarounds. I have removed all devices, rebooted, BT on, BT off, moved my keys and all my change to my left pocket, and balanced on my right foot. No luck.

I still can't re-pair with my vehicle and pairing with my BT headset results in the same situation (audio through device, voice calls through headset).

elmerys commented 5 years ago

I managed to find a work around that includes activating/deactivating NFC. I though this could help pin point the problem. Here is a description of what I did to have bluetooth work again:

1st instance of the problem. The phone does not connect to the bluetooth receiver on my audio system:

After that I managed to connect other bluetooth systems and everything worked fine. Then I tried to connect my car system. Things broke again.

2nd instance of the problem. The sound comes out of the phone speakers again after pairing with car bluetooth.

nutts0 commented 5 years ago

Damn I wasn't sure that this could have caused it. I had the same problem with my car. Played around with nfc, totally forgot about it and voila car plays audio.

stfnhh commented 5 years ago

Running on a Pixel 3a, had the same Bluetooth issues as described here but was able to "fix" it by disabling contact syncing and voice only leaving audio streaming. The most recent update has completely broken bluetooth.

Update: Thanks @elmerys - a combination of playing with the NFC toggles, rebooting and repairing fixed it.

k3anton commented 5 years ago

Running QP1A.191005.007.2019.10.07.21 on Pixel 3a.

Experience similar issues but no work around has worked for me so far. Connects fine and can read battery level fine on my headphones, but refuses to stream sound using the headphones.

chuckb7 commented 5 years ago

I have had mixed results. I've tried the NFC toggles, reboots, and removing/adding BT devices. I have had no consistent success. My headset worked, my car didn't, and my previously working stereo adapter stopped streaming audio.

For my vehicle, I was never able to pair from the phone. I was able to pair from the car, but then had the same audio issues.

The stereo adapter re-paired fine, but the audio came through the phone instead of the stereo. This has been working fine before forgetting the device and all the other changes.

I'm not sure any of these workarounds are a long term fix. I think they might be helpful to get a single device working at a time with some number of reboots, but nowhere near the stability I had prior to the Android 10 upgrade.

I'm happy to help collect additional information if someone can provide pointers.

thestinger commented 5 years ago

In order for progress to be made on this issue, people will need to start doing work on narrowing it down and debugging it. A good place to start would be determining whether this happens with AOSP without the GrapheneOS hardening changes. If it's caused by the hardening changes, the specific cause needs to be narrowed down, followed by debugging the problem to figure out why it happens so that a fix or workaround can be implemented. It's unclear what causes it and whether it's an issue with AOSP or only GrapheneOS. If it's an issue with AOSP too, then work will need to be done on narrowing down and fixing the problem there. It would also need to be compared against the stock OS as part of doing that.

chuckb7 commented 5 years ago

I am using GOS as my daily driver and don't have a second device to baseline against.

Short of re-flashing my device, I'm happy to do any testing with the four or five BT devices I have and document the test cases and results. Is there additional information that can be collected from my device that would help?

Or does this issue really just need someone willing to re-flash?

thestinger commented 5 years ago

It needs a developer able to replicate the issue to step up and work on it, using a device that they're able to flash with their builds. Collecting information is useful and may be helpful to a developer that eventually steps up to work on this, but it's important to note that no progress will be made until there's someone working on resolving it rather than only people figuring out the symptoms of the problem. Figuring out the symptoms of the problem is useful, but limited in what it can accomplish.

ghost commented 5 years ago

I was also experiencing Bluetooth issues and none of the provided workarounds helped. Then, today, my audio suddenly decided to route to my AirPods. I'll test other Bluetooth devices when I get home.

I didn't do anything of note to make it work. I just paired them again today to call someone and noticed my volume slider had the Bluetooth symbol. Sure enough, music is playing perfectly. Very bizarre indeed.

sama-boop commented 5 years ago

As a workaround what worked for me was activating Disable Bluetooth A2DP hardware offload under Settings -> Developer Options. (After which the phone is forced to reboot)

Phone: Pixel 3a XL

schneebonus commented 5 years ago

As a workaround what worked for me was activating Disable Bluetooth A2DP hardware offload under Settings -> Developer Options. (After which the phone is forced to reboot)

Phone: Pixel 3a XL

Worked for me. I did not re-pair my headset and did not change nfc settings. Just disabled the A2DP hardware offload and rebooted.

Sortova commented 5 years ago

It looks the the A2DP hardware offload is indeed busted.

When I paired my phone to the car it displayed the currently playing media but sound came out of the handset. Followed @sama-boop suggestion and disabled A2DP and it worked as expected. Thanks for the tip.

nutts0 commented 4 years ago

Can confirm disabling A2DP helped with Pixel 3 XL.

ghost commented 4 years ago

Pixel 3a: The problems occurred after updating to QP1A.191105.003.2019.11.05.23.

Bluetooth headphones or Bluetooth speakers were displayed as connected, but the sound came from the phone's speakers.

Fix: Forget the connected media devices, restarting the phone and reconnecting the devices solved the problem for me. The problem just reappeared. Activating 'Disable Bluetooth A2DP hardware offload' helps (as a workaround).

beerisgood commented 4 years ago

Can confirm the problem after latest update and that the fix solve it (pixel 3a)

Sadly after disable dev options the problem start again cause the setting go back to enabled.

Reddit thread: https://reddit.com/comments/dsdms7

c0ldplasma commented 4 years ago

Had this issue after software update to QQ1A.200205.002.02.04.01. Can also confirm disabling A2DP helped with Pixel 3a XL.

svcraig commented 4 years ago

Latest GrapheneOS self/local build requires the A2DP disabling workaround for me to stream BT audio to my car or portable speaker.

Confirmed working (i.e. no workaround needed) on vanilla AOSP (fresh release build for sargo at the latest March tag) with the vendor binaries and whatnot downloaded from Google.

Next step will be a rebuild using the android-prepare-vendor project instead of downloading and extracting the vendor files directly from Google.

If it still works, I'll start introducing GrapheneOS changes one repo at a time until I find a problem, at which I'll either debug it or binary chop through the commits, depending on how familiar I am with the problem area.

I bought a second Pixel 3a to use as a dev device for this project so I won't be hindered by any limitations of testing with a hip device.

thestinger commented 4 years ago

Happy to hear that someone is working on this.

svcraig commented 4 years ago

Update: I suspect the problem is somewhere within GrapheneOS.

Starting with a base of AOSP, I made the following changes one at a time, rebuilt, and tested to see if audio streaming via Bluetooth still worked without disabling A2DP offloading:

@thestinger are my last two points sufficient for replacing the kernel? The only "verification" I did was checking that the Kernel version changed in Settings (it did). This is hacky, and I'm not confident it's enough to say I've fully tested with the right kernel.

A large part of my uncertainty is that I lack understanding in how the kernel is built and why the file ends up where it does. For instance, I'm working with a Pixel 3a (sargo) yet the kernel build process starts in the kernel/google/crosshatch (3 XL) directory, and the build.sh script requires the argument "bonito" (3a XL). Then, once Image.lz4 is created, it ends up in the device/google/bonito directory. Why do we build from the crosshatch directory? Why does build.sh not need sargo specified? While I can guess why the build artifact ends up in a "bonito" directory, how/why does the sargo pick it up from there and is able to mould it into the resulting images?

I don't even know if I'm asking the right questions. Once I'm confident the kernel testing is sufficient, I'll proceed with replacing subsequent repositories one at a time.

thestinger commented 4 years ago

Why do we build from the crosshatch directory? Why does build.sh not need sargo specified?

Read the information provided in https://grapheneos.org/build#kernel instead of only the instructions. The same kernel source tree is used for that generation of devices. The bonito and sargo devices use the exact same kernel drivers so there's no reason to do separate builds.

thestinger commented 4 years ago

I'll proceed with replacing subsequent repositories one at a time.

Lots of changes are spread across multiple repositories and you can't always do it one at a time. It will lead to various forms of breakage and will sometimes break the build.

svcraig commented 4 years ago

Read the information provided in https://grapheneos.org/build#kernel instead of only the instructions. The same kernel source tree is used for that generation of devices. The bonito and sargo devices use the exact same kernel drivers so there's no reason to do separate builds.

That clears it up nicely. Thanks. My middle name could be RTFM and I probably still wouldn't, so I'll work on that before asking additional questions.

Lots of changes are spread across multiple repositories and you can't always do it one at a time. It will lead to various forms of breakage and will sometimes break the build.

Fortunately I realized that right away by starting with frameworks/base since it has a dependency on changes in libcore. "One" at a time is admittedly an oversimplification but I've been methodical about it to minimize issues. Cherry picking and reverting commits, and always doing a clean build has kept me on track.

Everything in my workspace right now consists of clean checkouts with no modifications by me, and about 75% of the GrapheneOS repos are in use. Still no Bluetooth issues.

thestinger commented 4 years ago

@svcraig Have you identified which repository or change causes the issue yet?

svcraig commented 4 years ago

@thestinger I haven't isolated the change yet, but I've narrowed down the search.

With AOSP + the Graphene kernel + Graphene bionic + Graphene hardened_malloc + Graphene system/core, there's no issue. Add in the Graphene kernel configs, and the issue appears.

But, with AOSP + the Graphene kernel + Graphene kernel configs, there's no issue. I'm going to double check this before digging any further because one of my notes doesn't make sense.

thestinger commented 4 years ago

Be very careful about testing it. Make sure you're actually testing the kernel. It sounds likely that you've made a mistake and reached an incorrect conclusion from that.

svcraig commented 4 years ago

I'm running a GrapheneOS device in addition to the testing, and the Kernel "details" in Settings match on the two devices, whereas the AOSP kernel produces different information in Settings. I'll double check the outputs of uname but is there any other way I can ensure I'm using the right kernel?

thestinger commented 4 years ago

But, with AOSP + the Graphene kernel + Graphene kernel configs, there's no issue. I'm going to double check this before digging any further because one of my notes doesn't make sense.

You said you were testing with AOSP with the GrapheneOS kernel and it wasn't causing an issue, but you also said that introducing the kernel caused the issue when you were using more of the repositories. It sounds like you're making mistakes and reaching incorrect conclusions from that. I seriously doubt that the issue only happens with the kernel changes added in but then doesn't happen with only the kernel changes.

thestinger commented 4 years ago

It just doesn't sound right.

svcraig commented 4 years ago

Last night I made a new directory, checked out the AOSP source, then replaced the kernel and kernel configs:

Build, flash, test. No issue.

I checked the output of uname -a. The kernel info from my full GrapheneOS device matches that of my latest test build: Linux localhost 4.9.200-g43f35d8cca26 #2 SMP PREEMPT Tue Feb 4 09:22:08 EST 2020 aarch64 . I don't have a copy of the string from the AOSP kernel but it is different.

Yesterday, my procedure was:

Build, flash, test. No issues.

Next:

Build, flash, test. No issues.

Build, flash, test. Issue found.

Running repo forall -c git remote -v shows clearly that all other repos than the ones noted above are from AOSP. A git status of the Graphene repos shows the correct branch. A git status of the kernel repos shows a changed Image.lz4 file, which matches the size from that in my GrapheneOS build (which I've called a fork, but I haven't modified the kernel)

I'll re-do it all today. If you see any problems with my approach please let me know.

thestinger commented 4 years ago

As I said before, that really doesn't sound correct. The kernel_configs repository is only used for testing including a compatibility self-test run at boot and doesn't actually do anything at runtime.

thestinger commented 4 years ago

There must be something wrong with the process that you're using. The results don't make sense. That kernel configuration repository does not control the configuration of the kernel but rather provides a set of base configurations that the kernel is tested against for compatibility. It could not be breaking anything at runtime. The only thing that could be broken is the boot process. If you leave out that repository, you should be getting an error message from using the GrapheneOS kernels, since the compatibility test based on those base configurations will fail. If you aren't receiving an error message on boot with that repository left out and with the GrapheneOS kernel included, something is wrong with your process. If anything else is changed by including that repository, something is wrong with your process.

thestinger commented 4 years ago

I would guess that the issue you're testing for doesn't actually occur on every boot so you're naturally getting failures on some boots but not others. That means any results you try to obtain are going to be flawed without a more accurate testing process.

thestinger commented 4 years ago

This is the only way that kernel/configs is used:

https://github.com/GrapheneOS/kernel_configs#are-the-config-requirements-tested

It is purely for documentation and testing, which is how I know that the conclusion that you've reached is wrong. I've worked with the configurations and code involved so I know exactly what's being done and how it is and isn't used. The documentation they have on this is accurate.

The results of the testing are meaningless without being able to reproduce them repeatedly and reliably. This is probably some kind of issue like a memory corruption bug that only sometimes occurs and may become more likely with more changes applied. There also may be some kind of state involved, so you'll need to perform factory resets between testing to make sure that the test results are independent from each other.

add1989 commented 4 years ago

@svcraig Thank you for all the work you've done so far in testing this. Hopefully a fix will be found soon!

(I am also having this problem on a Pixel 3A running QQ2A. Disabling A2DP offload works as a workaround at the moment.)

inthewaves commented 4 years ago

This is probably some kind of issue like a memory corruption bug that only sometimes occurs and may become more likely with more changes applied.

From initial testing, it seems like this is the case at least on a userdebug build on Pixel 3a.

If I revert https://github.com/GrapheneOS/platform_bionic/commit/010949662f6579419dd310606bf1418dbd53a971, then Bluetooth streaming works reliably (i.e. sounds plays through Bluetooth earbuds) even across multiple (10) reboots.

If I leave that commit alone and fastboot flashall -w a clean build, then Bluetooth streaming becomes unreliable. That is, for each reboot, one of two states seem to occur

State only seems to change on reboot. Even if I pair and repair the devices or turn off and on Bluetooth, there is no change in state.

thestinger commented 4 years ago

@inthewaves That makes sense. You can see we apply https://github.com/GrapheneOS/hardened_malloc/commit/2ecb0eb1d6d4a9da2ed0d435fc764d194247e8c3 in the branch for GrapheneOS to disable some of the security features for the camera service to reduce the problems it uncovers there. In handle_hal_bugs, try deleting the if statement using strcmp to make disabling those features unconditional and see if that changes anything with Bluetooth. There are other features that could be disabled to try narrowing down what's happening. It's probably a use-after-free bug like the ones being worked around in the camera service.

inthewaves commented 4 years ago

@thestinger Deleting the if statement wasn't successful

COLD static void handle_hal_bugs(void) {
    char path[256];
    if (readlink("/proc/self/exe", path, sizeof(path)) == -1) {
        return;
    }
    // const char camera_provider[] = "/vendor/bin/hw/android.hardware.camera.provider@2.4-service_64";
    //if (strcmp(camera_provider, path) == 0) {
    ro.zero_on_free = false;
    ro.purge_slabs = false;
    ro.region_quarantine_protect = false;
    //}
}

Still ran into the bug after the 3rd and 4th reboot on a non-incremental build

I'll try to check some of the other features (by disabling them in the Makefile) when I have time

inthewaves commented 4 years ago

@thestinger A promising development: I think I have managed to narrow the cause down, at least for my device and tag (Pixel 3a, on QQ2A.200405.005.2020.04.14.23).

Inside of external/hardened_malloc, I edited the Makefile:

--- a/Makefile
+++ b/Makefile
@@ -9,8 +9,8 @@ CONFIG_SLOT_RANDOMIZE := true
 CONFIG_SLAB_CANARY := true
 CONFIG_SLAB_QUARANTINE_RANDOM_LENGTH := 1
 CONFIG_SLAB_QUARANTINE_QUEUE_LENGTH := 1
-CONFIG_EXTENDED_SIZE_CLASSES := true
-CONFIG_LARGE_SIZE_CLASSES := true
+CONFIG_EXTENDED_SIZE_CLASSES := false
+CONFIG_LARGE_SIZE_CLASSES := false
 CONFIG_GUARD_SLABS_INTERVAL := 1
 CONFIG_GUARD_SIZE_DIVISOR := 2
 CONFIG_REGION_QUARANTINE_RANDOM_LENGTH := 128

With this change, I was able to do 10+ different reboots where audio was able to play through my Bluetooth earbuds.( Before, it was probably at most 2-3 reboots before Bluetooth streaming stopped working on the next reboot.)

I'll see if leaving one of those config options alone will help. This was also an incremental build, so I'll see if this holds for a clean build too.

thestinger commented 4 years ago

Enabling CONFIG_LARGE_SIZE_CLASSES shouldn't matter. It's probably CONFIG_EXTENDED_SIZE_CLASSES that makes the difference if it's really this. Do you still have the other changes from earlier or just this?

inthewaves commented 4 years ago

Ah, sorry, I didn't notice I left a revert commit last night in bionic when I was testing those two config options. I restored bionic, and now those config options aren't making Bluetooth working consistently. I'll keep testing other options carefully now.

inthewaves commented 4 years ago

@thestinger This time, I've narrowed down the issue to a single config option in hardened_malloc: SLOT_RANDOMIZE=false. (I didn't notice the Android.bp was the right thing to edit and I was focused on the Makefile the whole time.)

To be more careful, I did the following to make sure that this one-line config change is sufficient to solve the Bluetooth streaming issue (of course, I don't think this config change itself should be the fix that should go into production):

If someone else has a device where they can reproduce the bug and they're comfortable building and flashing with that device, it would be great if they could see if this change also does it for them.

inthewaves commented 4 years ago

I've followed https://github.com/GrapheneOS/hardened_malloc/commit/2ecb0eb1d6d4a9da2ed0d435fc764d194247e8c3 to attempt to make a workaround, but I still ran into the bug with this patch applied (the OR statement in handle_hal_bugsis just a quick way to see if either /vendor/bin/hw/android.hardware.bluetooth@1.0-service-qti or /vendor/bin/hw/android.hardware.audio@2.0-service would work). I don't think this approach would have worked anyways if the names are different for each device

From 29f89d756fb34618a8f380ae2e6c369b3dd7948e Mon Sep 17 00:00:00 2001
From: inthewaves <26474149+inthewaves@users.noreply.github.com>
Date: Thu, 23 Apr 2020 18:50:00 -0700
Subject: [PATCH] attempt workaround for bluetooth streaming issue

---
 h_malloc.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/h_malloc.c b/h_malloc.c
index dabce35..ffd9b7d 100644
--- a/h_malloc.c
+++ b/h_malloc.c
@@ -85,6 +85,7 @@ static union {
         bool zero_on_free;
         bool purge_slabs;
         bool region_quarantine_protect;
+        bool slot_randomize;
     };
     char padding[PAGE_SIZE];
 } ro __attribute__((aligned(PAGE_SIZE)));
@@ -355,7 +356,7 @@ static u64 get_mask(size_t slots) {
 }

 static size_t get_free_slot(struct random_state *rng, size_t slots, struct slab_metadata *metadata) {
-    if (SLOT_RANDOMIZE) {
+    if (ro.slot_randomize) {
         // randomize start location for linear search (uniform random choice is too slow)
         unsigned random_index = get_random_u16_uniform(rng, slots);
         unsigned first_bitmap = random_index / 64;
@@ -1067,10 +1068,16 @@ COLD static void handle_hal_bugs(void) {
         return;
     }
     const char camera_provider[] = "/vendor/bin/hw/android.hardware.camera.provider@2.4-service_64";
+    const char bluetooth_provider[] = "/vendor/bin/hw/android.hardware.bluetooth@1.0-service-qti";
+    const char audio[] = "/vendor/bin/hw/android.hardware.audio@2.0-service";
     if (strcmp(camera_provider, path) == 0) {
         ro.zero_on_free = false;
         ro.purge_slabs = false;
         ro.region_quarantine_protect = false;
+    } 
+    
+    if (strcmp(bluetooth_provider, path) == 0 || strcmp(audio, path) == 0) {
+        ro.slot_randomize = false;
     }
 }

@@ -1100,6 +1107,7 @@ COLD static void init_slow_path(void) {
     ro.purge_slabs = true;
     ro.zero_on_free = ZERO_ON_FREE;
     ro.region_quarantine_protect = true;
+    ro.slot_randomize = SLOT_RANDOMIZE;
     handle_hal_bugs();

     if (sysconf(_SC_PAGESIZE) != PAGE_SIZE) {
-- 
2.21.1