cb22 / macbook12-spi-driver

WIP input driver for the SPI touchpad / keyboard found in the 12" MacBook (MacBook8,1 + MacBook9,1)
GNU General Public License v2.0
298 stars 103 forks source link

Touchpad init fixes 2 #45

Closed roadrunner2 closed 7 years ago

roadrunner2 commented 7 years ago

This fixes the issues on the MBP14,1 reported by @peterychuang on #6.

l1k commented 7 years ago

Regarding commit 3cb8a1b, this sounds like an ordering issue that needs to be fixed in the master's driver. (Doesn't preclude such a workaround of course until it's fixed.)

Regarding commit 33e2e87, this seems to be a missing feature in the SPI core, it's indeed admitted in <linux/spi/spi.h>. A lot of chips need a delay between lowering of CS and the first clock cycle, e.g. to come out of a low-power state. Usually this is specified in the datasheet and is so low that it doesn't matter, but in some cases clearly it does.

l1k commented 7 years ago

For illustrative purposes I've cobbled together (untested) commit l1k/linux@ca6e39fd2616aece005d11551362337918c45955 which introduces a per-slave CS-to-CLK delay. However I gather from applespi.c that a per-message delay may be needed since you seem to delay only on reads, not on writes.

l1k commented 7 years ago

My code is atrocious as usual, 0-day has alerted me that I've missed to replace a few "us" with "usecs", updated branch is here.

peterychuang commented 7 years ago

After using this fix for close to a week now, I've just encountered one touchpad init failure while I was rebooting the machine multiple times to investigate the NVMe-related boot problem. The dmesg output has the same Error writing to device: b3 b3 b3 b3 line before modswitch.

Not sure if this is significant though, as I was testing things just now.

roadrunner2 commented 7 years ago

@l1k

Regarding commit 3cb8a1b, this sounds like an ordering issue that needs to be fixed in the master's driver. (Doesn't preclude such a workaround of course until it's fixed.)

I agree that it's a bug in the spi driver core. However, it can't normally occur. The spi_master_class and spi_slave_class structs are static, i.e. not normally accessible outside spi.c - I had to use a hack to get a pointer to the struct to be able to register a class callback. And nothing in spi.c uses class callbacks (the class is just used to be able to enumerate the devices). So I'm not sure if they'd be willing to accept a patch.

Regarding the clock delay, that looks good - I'll give it whirl shortly. While you're right that currently we don't have that delay on writes, it probably doesn't hurt, and we may just have been lucky (the whole delay stuff is far more fragile than I was anticipating - wish there were some clear documentation on what delays where needed where).

roadrunner2 commented 7 years ago

@peterychuang Thanks for the report. Bummer, looks like I'll need to add that extra delay between read and write. Unless this was due to currently not having a delay between cs and clk for writes. Hmm...

roadrunner2 commented 7 years ago

@l1k I just realized that your clk-delay patch is working quite differently from the applespi driver. In particular:

Of course, these assumptions could be all wrong, and I'm just getting lucky with things...

Also, I don't think spi_transfer_one_message is being used here, but instead pxa2xx_spi_transfer_one_message (but I need to double check that).

l1k commented 7 years ago

@roadrunner2: Oh, so that won't occur with your spi-properties branch, right? Indeed then it's not necessary to submit a patch.

As for the proper timings, I've just dug around in AppleIntelLpssSpiController.kext and AppleIntelLpssGspi.kext. Basically it should be sufficient to look where IODelay() is called and the only places of interest appear to be in AppleIntelLpssGspi::csCtl(), which seems to delay after asserting CS, and in AppleIntelLpssGspi::transferMmioDuplexSubr() which delays after write and before read. The duration of the delay is stored in the AppleIntelLpssGspi class and I wasn't immediately able to find out where it's set. The delay after asserting CS is most likely coming from the spiCsDelay property, however this would be in usec then, not 10*usec.

The resetA2RUsec and resetRecUsec properties seem uninteresting, they're used by AppleIntelLpssGspi::reset(), and since we never reset the device, I think we can ignore them.

l1k commented 7 years ago

@roadrunner2: My comment above is a reply to yours of 5 hours ago, I hadn't read your new comment before sending it away, sorry. You're certainly right about pxa2xx_spi_transfer_one_message() being used, I mixed it up with the bcm2835 (RasPi) SPI master driver which doesn't define that callback but uses the one in the core.

roadrunner2 commented 7 years ago

@l1k Right: with your apple properties changes that code becomes obsolete, and hence the race is not encountered.

Regarding timings, that's interesting: we definitely need that 100µs between transfers, in addition to delays after asserting CS. But I've been reworking things a bit, and now the only multi-transfer message is the write-command + read-status, so maybe the delay in there is the write-read delay you saw in the apple code.

The comments about the reset* properties is interesting and make sense - but that then leaves the question of where write-read delay comes from - maybe that's hardcoded? I had picked the spiCSDelay because it was the only one without known units, but the longer I look at things (and the more things get clarified) the worse that choice looks.

l1k commented 7 years ago

@roadrunner2: The macOS driver lets the SPI master do DMA and the purpose of the delay appears to be that it waits for the DMA to complete. So it's not really comparable to how the Linux driver works, which just lets the SPI core invoke a callback upon completion. The macOS driver calculates the delay in AppleIntelLpssGspi::spiTransferUsecCalc() using a somewhat odd formula:

usecWord = ((spiClockPeriod * wordSize + 999) * 0x10624dd3) >> 0x26
usecDpxDelay = (((usecWord * 1000 - 2656) + 999 >> 0x3) * 0x20c49ba5e353f7cf) >> 4

In cases like this we usually just insert a hardcoded delay, see e.g. the udelay(100) in drivers/platform/x86/apple-gmux.c when accessing the GMUX registers.

roadrunner2 commented 7 years ago

@l1k Thanks for that info. The first line is clear:

usecWord = ((clock-period-in-ns * wordsize + 999) / 1000 = ceil(clock-period-in-ns * wordsize / 1000) = 1

(clock-period-in-ns is 125)

But I can't make sense of that second line; the usecWord * 1000 - 2656 appears to be in 32-bit signed arithmetic, the rest in unsigned 64-bit; and result truncated to 32 bits. But that results in a value of 1958505086 for usecDpxDelay. Still trying to figure out what I'm missing.

roadrunner2 commented 7 years ago

Here is a substantially updated set of patches. I have been able to reproduce a number of issues by generating lots of commands to send while at the same time generating a lot of events (mostly by running my finger around the touchpad). As a result I think I've been able to clean up things quite a bit and make them very stable now - I'm unable to produce any failures anymore. As part of this the touchpad-init is now handled/processed like the other commands and should therefore also be robust now.

@peterychuang Could you try this out? You seem to have been able to consistently get errors here.

Many thanks @l1k for the assistance here in figuring some of the delays. I still haven't been able to make sense of the usecDpxDelay calculation above, so just have a hardcoded 100µs delay now (any time we switch between reads and writes).

peterychuang commented 7 years ago

@roadrunner2 sure. Is it the touchpad-init-fixes-2 branch?

roadrunner2 commented 7 years ago

@peterychuang Yes; either that or the touchbar-driver-hid-driver branch (the latter also contains the touchbar driver, which you don't need, but is otherwise identical). TIA.

stefan-langenmaier commented 7 years ago

The latest changes roadrunner2@1b978045c92be0e91ee34c65fd86b8384f76a354 in the touchbar-driver-hid-driver branch made the touchpad on a MacbookPro14,2 usable.

cb22 commented 7 years ago

@roadrunner2 sorry I've been pretty busy of late so I haven't had the chance to give this project the attention it deserves.

I'll try out this branch on my Macbook9,1 - I occasionally had init problems too so it will be interesting to see if it fixes those.

Other than that, if you can fix the conflict (caused by me merging #42 which I see this includes) I'd be happy to merge it in.

peterychuang commented 7 years ago

@roadrunner2 I've been using touchbar-driver-hid-driver branch for a little over a week. I've encountered init problem once pretty soon after I switched. But I'm not sure if that one time counts, as the NVMe problem was hitting my machine pretty bad at that one time, so I had to reboot the machine many times before I booted into the DE successfully, and once the boot was successful, I had wifi init problem too in addition to the touchpad problem.

Other than that one time, I haven't encountered another touchpad init problem.

roadrunner2 commented 7 years ago

@cb22 No problem - I was waiting/hoping for some folks to test this first anyway. I rebased and re-pushed the branch now.

@stefan-langenmaier and @peterychuang: thanks to both of you for testing and reporting back. Good to hear it mostly works; sad to hear it only mostly works - I was really hoping to have nailed this once and for all. But the fact that it fixed this on 14,2, and that only one failure occurred under apparently weird conditions, and my much more extensive tests work, does give me hope we're on the right track now and getting close.

@peterychuang: do you happen to still have the logs from when the init failed? If so, could you show me what the exact error was?

peterychuang commented 7 years ago

@roadrunner2 unfortunately I haven't kep the log. If I remember it right, it was the same applespi: Error writing to device: b3 b3 b3 b3 thing.

I'll let you know if I encounter the problem again. But as I said, that one time could be a fluke.

roadrunner2 commented 7 years ago

@peterychuang FYI: IIRC you're using Arch, and that uses systemd and hence journald - you can show the messages from previous boots with journalctl -b -<n> (for n'th previous boot), and journalctl --list-boots makes it easy to narrow down which boots to look at.

Anyway, the b3 b3 b3 b3 has, in my testing, indicated insufficient delay when switching between reads and writes. So maybe under some circumstances the current 100µs delay is not sufficient. Well, let's see how many others run into this. Though if you feel like experimenting, you could always try a larger value for SPI_RW_CHG_DLY.

peterychuang commented 7 years ago

@roadrunner2 unfortunately that was too long ago that the log had been rotated out.

The good news is, now that a few more days have passed, I haven't encountered any init problem. So it's quite likely that the one init failure I mentioned was a one-off unrelated thing.

cb22 commented 7 years ago

@roadrunner2 merged 👍. On my Macbook9,1 so far it has been looking good.

I also experienced #6 a few times. After doing quite a few reboots, it seems like that no longer happens!

roadrunner2 commented 7 years ago

@cb22 Thanks for testing and merging. @peterychuang Ok - I'm sure you'll let us know if you run into again then 😉