eldruin / pwm-pca9685-rs

Platform-agnostic Rust driver for the PCA9685 I2C 16-channel, 12-bit PWM/Servo/LED controller
Apache License 2.0
29 stars 13 forks source link

Improving latency of `set_channel_` #10

Open epigramengineer opened 1 year ago

epigramengineer commented 1 year ago

Hi! First all I love this library and I'm new to embedded programming, so hopefully the questions I'm asking are relevant.

I'm trying to control a stepper motor with this driver over i2c from my rasbperry pi. I'm setting the driver to 1526hz. Very roughly, this is my setup:

    let dev = hal::I2cdev::new("/dev/i2c-1").unwrap();
    let address = pwm_pca9685::Address::from(0x60);
    let mut pwm = pwm_pca9685::Pca9685::new(dev, address).unwrap();
    pwm.enable().unwrap();
    pwm.set_prescale(3).unwrap(); // 1526 Hz

    // use the pwm controller here

My stepper motor code does calculations for the step and microstepping (not important now), and then calls:

        self.pwm.set_channel_on_off(self.pwma, 0, CURVE[aidx] * 16).unwrap();
        self.pwm.set_channel_on_off(self.pwmb, 0, CURVE[bidx] * 16).unwrap();
        self.pwm.set_channel_on_off(self.ain2, 0, power & 0b0001 * 0x0FFF).unwrap();
        self.pwm.set_channel_on_off(self.bin1, 0, power & 0b0010 * 0x0FFF).unwrap();
        self.pwm.set_channel_on_off(self.ain1, 0, power & 0b0100 * 0x0FFF).unwrap();
        self.pwm.set_channel_on_off(self.bin2, 0, power & 0b1000 * 0x0FFF).unwrap();

The important part here is that I am setting set_channel_on_off 6 times per step. When I time my application it takes approximately 600 micoseconds per call to set_channel_on_off. If I just use set_channel_off it takes 400 microseconds (improvement). I looked at set_all_on_off hoping batching would work, but that takes longer than each call sequentially.

I'm trying to run my motor at faster speeds, and I'm currently blocked because each of these calls is (relatively) slow. I'm trying to figure out what to do. Do you have any idea what the latency would be here? I suspect this is either in how quicly I2C reads the value or perhaps because I'm using dev fs implementation of I2C, but researching these topics is coming up pretty empty. I'm not sure if setting the i2c baud rate would improve this or not, so wanted to ask here for ideas

Additional information: I'm running on raspberry pi with isolcpus and taskset, built in release mode so I shouldn't be having any problems with OS scheduling or code performance. I'm calculating the timing as the average of several thousand invocations of my step code, so the latency isn't from the timing.

eldruin commented 1 year ago

Hi, This is interesting. I have not measured the latency of the methods but the code is quite straight-forward. There might be a problem in your code if you run this multiple times without resetting the device as the device may be still enabled but the driver does not know it and thus the prescale is not being applied since it can only be changed while in sleep mode. To be sure, I would do this:

// ensure the device is sleeping as it may have been left enabled if you run this multiple times without a reset
pwm.disable().unwrap();
pwm.set_prescale(3).unwrap();
pwm.enable().unwrap();

A few questions:

  1. Which RaspberryPi are you using and what is the clock frequency?
  2. Are you running your program in release or debug mode? i.e. are you building it with --release and actually running that binary?
  3. Have you tried running this on a bare-metal MCU like an stm32f4 or similar?
  4. How are you measuring the latency? Do you have any logic analyzer / oscilloscope graphs?

You can also tweak the release profile further for increased performance with stuff like lto=true, codegen-units=1. See here You can also speed up the I2C baudrate to 400kHz like this

epigramengineer commented 1 year ago

Thanks for the quick response! I did more research into this, and here's what I came up with. The interesting part to me is the interface for i2c.write() which states:

    /// Sends bytes to slave with address `addr`
    ///
    /// # I2C Events (contract)
    ///
    /// ``` text
    /// Master: ST SAD+W     B0     B1     ... BN     SP
    /// Slave:           SAK    SAK    SAK ...    SAK
    /// ```
    ///
    /// Where
    ///
    /// - `ST` = start condition
    /// - `SAD+W` = slave address followed by bit 0 to indicate writing
    /// - `SAK` = slave acknowledge
    /// - `Bi` = ith byte of data
    /// - `SP` = stop condition
    fn write(&mut self, addr: u8, bytes: &[u8]) -> Result<(), Self::Error>;

This means it is sending 1 byte at a time, and then waiting for the device to send back an acknowledge. I'm assuming here that the device can only send an ack once per cycle, meaning this method would limit how quickly I can send new state information to the device. This would also explain why set_all_on_off is linearly slower than set_channel_on_off and why that takes approximately 2x as long as set_channel_on.

I've looked at the transfer command for i2c instead, which is the second example here: https://docs.rs/i2cdev/latest/i2cdev/index.html - This appears to send all the bytes without waiting for an ack, but constructing the messages seems to be specific to each implementation (hence LinuxI2CMessage). I'm not sure how that would work, but it seems like that might be an appropriate thing to surface on the pwm interface similar to set_all_on_off.

In case that's not the lead answering your questions:

  1. Which RaspberryPi are you using and what is the clock frequency?

Raspberry Pi model 4 (1.5 Ghz). I doubt this is the limiting factor

  1. Are you running your program in release or debug mode? i.e. are you building it with --release and actually running that binary?

Included in the original post last paragraph; "with isolcpus and taskset, built in release mode"

  1. Have you tried running this on a bare-metal MCU like an stm32f4 or similar?

No

  1. How are you measuring the latency? Do you have any logic analyzer / oscilloscope graphs?

No, pretty naive stuff here just wrapping it in rust timing logic with Instant::now() and diffing over a large sample of steps. I experimented with how many calls to set_channel_on_off per step to extrapolate the linear cost of each call.

Question for you

Would it be feasible to add the transfer mechanism to the interface here? I'm not sure quite how it would work abstracting over the I2cMessage type

eldruin commented 1 year ago

I see. I agree the running speed should not be the limiting factor.

The transfer method you refer to does the same thing as the write under the hood. It would not bring any transmission speed improvement. The documentation of the write method simply describes how the I2C protocol works. See here.

The difference between the methods is how much they transfer:

So if you are calling set_channel_on_off 6 times, that amounts to (address byte + 5 bytes) * 6 = 36 bytes transferred vs. address byte + 65 bytes = 66 bytes transferred.

It seems from your code that the on is always at 0 so it would be more efficient to set all on to 0 outside of the loop and inside call set_channel_off, which transfers: address byte + 3 bytes = 4 bytes per call so 24 bytes in total.

Have you tried increasing the baud rate like I said above?