Richard-Gemmell / teensy4_i2c

An I2C library for the Teensy 4. Provides slave and master mode.
MIT License
92 stars 19 forks source link

analogWrite interferes with slave/listener receiving data #15

Closed niek-dewit closed 3 years ago

niek-dewit commented 3 years ago

Hi @Richard-Gemmell I've been trying to use your library to exchange data between 2 teensy 4.0 micro-controllers. The master controller provides the listening controller with sound information, which the listening controller than turns into audio signals with the help of the build in analogWrite functionality.

I ran into an issue that the listener seemed to stop listening/receiving as soon as analogWrite was called.

I made a really simple setup to reproduce this issue. On my own setup, the listener stopped listening instantaneously after calling analogWrite, but in this simplified example it takes a couple seconds before the issue appears.

My controllers are connected to each other on pins 18 & 19.

I am not sure if this is an actual issue in the library, or if this is an issue in my example code?

Master code:


#include <Arduino.h>
#include <i2c_device.h>

I2CMaster& master = Master;

const uint8_t listener_address = 127;
uint8_t buff[513] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
unsigned int speaking_length = 10;

void setup() {
    master.begin(400 * 1000U);
    Serial.begin(9600);
    Serial.println("Started speaking");
}

void loop() {
    master.write_async(listener_address, buff, speaking_length, true);
    Serial.println("I spoke: ");
    for(unsigned int i = 0; i < speaking_length; i ++) {
        Serial.print(buff[i]); Serial.print(",");
    }
    Serial.println();
    delay(100);
}

Slave code:

#include <Arduino.h>
#include <i2c_driver.h>
#include "imx_rt1060/imx_rt1060_i2c_driver.h"

const uint8_t listener_address = 127;

I2CSlave& listener = Slave;
const size_t listening_buffer_size = 513;
uint8_t listening_buffer[listening_buffer_size] = {};
volatile size_t listener_bytes_received = 0;

void setup() {
  Serial.begin(9600);
  Serial.println("Started listening");
  listener.after_receive(after_receive);
  listener.set_receive_buffer(listening_buffer, listening_buffer_size);
  listener.listen(listener_address);
  //analogWrite(0, 1); //uncomment this line, and the slave will eventually stop listening
}

void loop() {
  Serial.println("Still alive!");
  delay(100);
}

void after_receive(size_t length, uint16_t address) {
  Serial.println("I received: ");
  for(unsigned int i = 0; i < length; i ++) {
    Serial.print(listening_buffer[i]); Serial.print(",");
  }
  Serial.println();
}

Master output:

I spoke: 
0,1,2,3,4,5,6,7,8,9,
I spoke: 
0,1,2,3,4,5,6,7,8,9,
I spoke: 
0,1,2,3,4,5,6,7,8,9,
I spoke: 
0,1,2,3,4,5,6,7,8,9,
...

Slave output (analogWrite line commented out):

Still alive!
I received: 
0,1,2,3,4,5,6,7,8,9,
Still alive!
I received: 
0,1,2,3,4,5,6,7,8,9,
Still alive!
I received: 
0,1,2,3,4,5,6,7,8,9,
Still alive!
I received: 
0,1,2,3,4,5,6,7,8,9,
...

Slave output (analogWrite line enabled):

0,1,2,3,4,5,6,7,8,9,
Still alive!
I received: 
0,1,2,3,4,5,6,7,8,9,
Still alive!
I received: 
0,1,2,3,4,5,6,7,8,9,
... after 10 or so seconds..:
Still alive!
I received: 
0,1,2,3,4,5,6,7,8,9,
Still alive!
Still alive!
Still alive!
Still alive!
Still alive!
Still alive!
Still alive!
Richard-Gemmell commented 3 years ago

Hi Niek,

I'll try your code and see if I can reproduce the problem.

Your example code has a couple of minor problems. Given the timings, I wouldn't expect them to cause this problem but it's worth checking.

It's possible that there is some sort of electrical coupling between the PWM and I2C. Please can you try the following experiments and let me know if they make any difference.

It's also possible (but unlikely) that the PWM and I2C configurations conflict in some way. Please try:

Thanks for taking the time to create a simple version of the code. It makes it much easier for me to try to reproduce the problem.

cheers, Richard

niek-dewit commented 3 years ago

alright, I put in some more effort into this issue myself. Actually when I put everything on some breadboards, everything seems to work fine even if I enable the analogWrite line

image

So it's probably something in my full setup that is interfering with i2c.. the full setup is a bit more complex, so it will be hard to find the issue there, but at least it isnot an issue with this library then. This is my full setup schematically:

image

But since this is not an issue with this library, ill close the github issue.

Richard-Gemmell commented 3 years ago

Thanks for letting me know. Good luck.

niek-dewit commented 3 years ago

alright, so I dont think this is necessarily an issue with analogWrite at all. But maybe calling analogWrite just made some unrelated issue occur more quickly.

I changed the pullup resistors to 330ohm because i checked on my scope and the rising edges were pretty rounded off with 4.3k. I noticed on the scope that whenever the slave stopped listening, the clock would stay high, while the other line stayed low. Which I think means that the master doesnt even attempt to send data to the slave.

I don't know exactly how the i2c protocol works, but now I suspect that the master needs to have a slave actively listening on an address before it will attempt to send data to this address?

For some reason the slave keeps "forgetting" that it should listen to a specific address, if I change my slave code to the following, then it does work again:

#include <Arduino.h>
#include <i2c_driver.h>
#include "imx_rt1060/imx_rt1060_i2c_driver.h"

const uint8_t listener_address = 127;

I2CSlave& listener = Slave;
const size_t listening_buffer_size = 513;
uint8_t listening_buffer[listening_buffer_size] = {};
volatile size_t listener_bytes_received = 0;

void setup() {
  Serial.begin(9600);
  Serial.println("Started listening");
  listener.after_receive(after_receive);
  listener.set_receive_buffer(listening_buffer, listening_buffer_size);
  listener.listen(listener_address);
  analogWrite(0, 1);
}

void loop() {
  Serial.println("Still alive!");
  delay(100);
  listener.listen(listener_address); // this line is new
}

void after_receive(size_t length, uint16_t address) {
  Serial.println("I received: ");
  for(unsigned int i = 0; i < length; i ++) {
    Serial.print(listening_buffer[i]); Serial.print(",");
  }
  Serial.println();
}

Now, it does keep disconnecting randomly every now and then, but it will automatically reconnect. For me this is a good enough hack for now, since it is not a big deal if the listener misses some data here and there. But still strange behavior that I cannot explain, probably because I am not familiar enough with the i2c protocol :P

Richard-Gemmell commented 3 years ago

Hi Niek,

Calling listener.listen(listener_address) completely re-initialises the slave's I2C driver. This would clear any problem with the slave's I2C config. It would also clear problems where the master and slave get stuck waiting for each other. IIRC the driver already clears a stuck bus automatically.

I'll do a bit of digging and think about the symptoms you're getting.

I don't use pullups in my own setups when the master and slave are within a few centimeters of each other. I've never used a resistance lower than 1.1k.

cheers, Richard

Richard-Gemmell commented 3 years ago

FWIW I've done quite a lot of testing with master/slave setups (usually with a Raspberry Pi master) and I've never seen this problem. I have seen issues where data gets corrupted occasionally. I typically see millions of successful messages per failure. I've never tested a system that used analogWrite() though. :)

niek-dewit commented 3 years ago

i think in my case the pullups are necessary, since another device is also connected to the same i2c lines besides the communication between the 2 teensy's. A usb-b port with a 50cm-ish cable is also attached to the 2 lines. This is then connected to some external pads on my project so that the main controller can read data from an external eeprom.

So these wires of the external cable probably introduce some capacitance on the i2c lines.

I wrote earlier that the clock would stay high, while the other line stayed low. , but I am also seeing cases where both lines stay low... not sure exactly when which case happens.

Richard-Gemmell commented 3 years ago

Do you see the problem with the breadboard setup or just the full layout?

niek-dewit commented 3 years ago

https://user-images.githubusercontent.com/17322217/115967080-de359980-a530-11eb-85ba-69125d339342.mp4

little video of the behaviour i have now. So this is based on a more complex script. so i wont share it in its entirety but this is basically it: master


void loop() {
  delay(10);
  sendSounds();
}

void sendSounds() {
  ........
  master.write_async(listener_address, buff, buffIndex, true);
  finish();
}

void finish() {
    elapsedMillis timeout;
    while (timeout < 200) {
        if (master.finished()){
            return;
        }
    }
    Serial.println("Master: ERROR timed out waiting for transfer to finish.");
}

Slave

#include "TeensyTimerTool.h"
using namespace TeensyTimerTool;

double waveStepMicroseconds = 10;
PeriodicTimer waveTimer = PeriodicTimer(TCK);

void setup() {
  Serial.begin(9600);
  delay(2000);
  Serial.println("Started listening");
  slave.after_receive(after_receive);
  slave.set_receive_buffer(slave_rx_buffer, slave_rx_buffer_size);
  slave.listen(listener_address);

  analogWriteResolution(analogResolution);
  for(auto& speaker: speakerOutputPins) {
     analogWriteFrequency(speaker, analogFrequency);
  }
  waveTimer.begin(_wave, waveStepMicroseconds);
}

void _wave() {
  for(auto& sound: sounds) {
    sound.waveStepsSinceStart++;
    if(sound.waveStepsStop > sound.waveStepsSinceStart) {
      analogWrite(sound.speaker, sound.sine[sound.waveStepsSinceStart%sound.waveStepLength]);
    } else {
      analogWrite(sound.speaker, restAnalog);
      sounds.erase(std::remove(sounds.begin(), sounds.end(), sound), sounds.end());
    }
  }
}

void loop() {
  delay(10);
  if(slave_bytes_received > 0) {
     ...
      slave_bytes_received = 0;
  }
  slave.listen(listener_address);

}

void after_receive(size_t length, uint16_t address) {
  if(length % 16 == 1) {
    slave_bytes_received = length;
  }
} 

everytime i reset the slave teensy, you can see on the scope that the master suddenly starts "looking" for the slave again by sending some pulses to define the address i presume. Then when the slave comes online, the master will successfully send some data a couple times and then the 2 lines go flat on the scope again and nothing happens anymore.

as you can see, I also use teensytimertool for the wavetimer (which is using the softwaretimer TCK) (to make sine waves for the speakers). But i dont think this library necessary has an effect on this, since i also had the issue in the basic example without the teensytimertool library. Maybe the 10microsecond interval screws stuff up with i2c? If I increase the interval to 20 microseconds it already seems to go on for much longer.

But even with a 20 microsecond interval, if i dont keep calling slave.listen(listener_address), it will fail very quickly

Richard-Gemmell commented 3 years ago

It's possible that there are multiple problems here. I re-created your breadboard setup but without the pullups or orange wire. breadboard

I then ran modified versions of your simple example. The changes blink the LED so I can tell if a Teensy hung. I also added code to handle the asynchronous behaviour. There's a bit of code in the master to report when it detects any I2C errors.

This is working correctly for me. I've run several trials of 5 minutes each without any errors. I've enabled analogWrite() and it still works Ok.

Please can you try the same setup and see if it's Ok for you as well.

Master:

#include <Arduino.h>
#include <i2c_device.h>

// Blink the LED to make sure the Teensy hasn't hung
volatile bool led_high = false;
void blink_led();

I2CMaster& master = Master;

const uint8_t listener_address = 127;
uint8_t buff[513] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
unsigned int speaking_length = 10;
bool first_message = true;

void setup() {
    // Turn the LED on
    pinMode(LED_BUILTIN, OUTPUT);
    digitalWrite(LED_BUILTIN, true);

    master.begin(400 * 1000U);
    Serial.begin(9600);
    Serial.println("Started speaking");
}

void loop() {
    bool finished = master.finished();// || first_message;
    if (finished)
    {
        first_message = false;

        // What happened to the last message?
        if (master.has_error()) {
            Serial.print("Failed with error code ");
            Serial.println((int)master.error());
        } else {
            Serial.print("I sent ");
            Serial.print(master.get_bytes_transferred());
            Serial.println(" bytes.");
        }
        Serial.println();

        // Send the next message
        master.write_async(listener_address, buff, speaking_length, true);
        Serial.print("I spoke: ");
        for(unsigned int i = 0; i < speaking_length; i ++) {
            Serial.print(buff[i]); Serial.print(",");
        }
        Serial.println();
    } else {
        Serial.print(master.get_bytes_transferred());
        Serial.println(" bytes sent...");
    }

    blink_led();
    delay(100);
}

void blink_led() {
    led_high = !led_high;
    digitalWrite(LED_BUILTIN, led_high);
}

Slave:

#include <Arduino.h>
#include <i2c_driver.h>
#include "imx_rt1060/imx_rt1060_i2c_driver.h"

// Blink the LED to make sure the Teensy hasn't hung
volatile bool led_high = false;
void blink_led();

const uint8_t listener_address = 127;

I2CSlave& listener = Slave;
const size_t listening_buffer_size = 513;
uint8_t listening_buffer[listening_buffer_size] = {};
uint8_t listening_buffer_2[listening_buffer_size] = {};
volatile size_t listener_bytes_received = 0;
void after_receive(size_t, uint16_t);

void setup() {
    // Turn the LED on
    pinMode(LED_BUILTIN, OUTPUT);
    digitalWrite(LED_BUILTIN, true);

    Serial.begin(9600);
    Serial.println("Started listening");
    listener.after_receive(after_receive);
    listener.set_receive_buffer(listening_buffer, listening_buffer_size);
    listener.listen(listener_address);
    analogWrite(0, 1); //uncomment this line, and the slave will eventually stop listening
}

void loop() {
    Serial.println("Still alive!");

    if (listener_bytes_received) {
        Serial.println("I received: ");
        for(unsigned int i = 0; i < listener_bytes_received; i ++) {
            Serial.print(listening_buffer[i]); Serial.print(",");
        }
        Serial.println();

        // Clear listener_bytes_received to signal that we're ready for another message
        listener_bytes_received = 0;
    }

    blink_led();
    delay(100);
}

void after_receive(size_t length, uint16_t address) {
    if (!listener_bytes_received) {
        memcpy(listening_buffer_2, listening_buffer, length);
        listener_bytes_received = length;
    }
}

void blink_led() {
    led_high = !led_high;
    digitalWrite(LED_BUILTIN, led_high);
}
Richard-Gemmell commented 3 years ago

Is the USB device on the I2C bus a master or slave device? There may be a driver problem if it's an I2C master as I've never tested a multi-master setup. I suggest that you disconnect it until you can get the rest working reliably.

Given that the simple example works for me, I think that the driver is probably working Ok as long as the USB device is an I2C slave.

You may well have electrical problems of some sort. The most common issue is electrical noise. Rounded square waves might be problem as well but noise would be my top suspect.

Try dropping the I2C bus speed to 100 kHz. It might be a bit more robust and you probably don't need the extra speed. (master.begin(100 * 1000U))

The I2C protocol is surprisingly simple. The I2C Specification is well written. Section 3.1 will tell you what the SDA and SCL lines should be doing. (You can ignore most of the rest of the spec.)

i2c-bus.org has a lot of helpful information. It explains the I2C protocol and has a lot of help about electrical issues.

Richard-Gemmell commented 3 years ago

I'm currently running a test with the code from your first message. The hardware is the layout I used above but with the USB power cable running into the slave. So far it's been running for 30 minutes without any problems.

Update: I shut it down after it ran for 2 hours without problems.

niek-dewit commented 3 years ago

alright. So i am using your code, and it does seem a lot more stable now. Maybe something with the second buffer in the slave that makes it more stable.. But when plugging them into my project pcb (without the external usb on the i2c lines plugged in), it DOES keep up a lot longer now. But once after some 15-ish minutes, the master still got stuck on x bytes sent.... (most of the time 8 bytes) Which would be fine if I was able to reset the i2c master/slave in the code, but I haven't found a way to do this yet. Now when I power cycle the system, it will start working correctly again for a while.

I also noticed that when I attach my scopes to the lines, it will fail much more quickly, and also when I connect any other cables to the i2c lines(such as the external usb) it will fail almost instantly.

Before it fails, the master will report I2CError 10 (data_nak) a couple times, then send some data successfully a bunch of times, and then suddenly get stuck on 8 bytes sent....

Here is a video of the behavior: https://user-images.githubusercontent.com/17322217/115992078-07573800-a5cc-11eb-80bb-3aa301b1abcb.mp4

I feel like the signals on the scope look pretty clean?

WhatsApp Image 2021-04-25 at 13 40 24

It does look like the purple data signal has some 200khz wave noise, which could possibly be introduced by the power supply or step-up boost converters and sound amplifiers indirectly connected to the controllers.

Is this low amplitude 200khz wave enough to throw off the i2c communication? In any case, if I could just "reset" the slave and master whenever the master gets stuck on 8 bytes sent..., then I think at least I could get this working as a software hack on the current hardware setup.

niek-dewit commented 3 years ago
Is the USB device on the I2C bus a master or slave device? 

The other device is also a slave, however the usb port is just connected to some metal pads on the outside of the project, which you can press a memory card against to load in some data (as in a game card to load the game). But even without pressing the game card against the pads, the cable and pads by themselves already affect the system.

Richard-Gemmell commented 3 years ago

The double buffering doesn't affect the reliability of the link. It's there to avoid a problem where the acts on a message when the buffer contains half of the previous message and half of the new message. On the other hand, removing calls to Serial.print() from the event handler may have improved it by making the ISR faster. Likewise, making sure the master doesn't send a new message before finishing the previous message should help.

The master reports error_code 10 (NACK) for various reasons. See section 3.1.6 of the I2C spec for details. One reason is that the slave didn't respond at all. That's the code I get if I disconnect one of the I2C wires. If I reconnect the wire the I2C link comes back up automatically.

Richard-Gemmell commented 3 years ago

I don't have much experience with the electrical side so please treat my comments about data lines etc with caution.

FWIW the clock line looks good to me. I assume the purple trace is SDA. The signal on that line looks suspect to me. In particular, some of the signals look more like spikes than square waves. Do you see the same narrow spikes on the breadboad setup?

I2C doesn't have any kind of error correction which makes it vulnerable to noise. The Teensy has a built in Schmitt trigger which ignores a lot of minor noise. Other than that the spec defines a HIGH signal as 70% of Vdd or higher and LOW as 30% of Vdd or lower. Noise which crosses these boundaries can cause problems.

Having said all that, I've never seen the I2C link just stop before. It's almost like the SDA line is being disconnected. Is it possible that there's a loose connection somewhere?

Richard-Gemmell commented 3 years ago

I wouldn't expect the low amplitude 200 kHz signal to be an issue. The amplitude isn't high enough to look like an I2C signal. I'm more worried about how narrow the spikes are. The might be too narrow to count.

Richard-Gemmell commented 3 years ago

You could try to catch errors on the slave and then reset it. e.g. you could put something like this in loop().

    if (listener.has_error()) {
        Serial.println("Slave I2C error. Resetting.");
        listener.listen(listener_address);
    }
niek-dewit commented 3 years ago

This is what the signals look like on a breadboard without any pull-ups.

WhatsApp Image 2021-04-25 at 14 46 44 WhatsApp Image 2021-04-25 at 14 48 35

Honestly I think we might be onto something here. Because of the stronger pullups that I am using now on the project pcb, it will pull up those tiny spikes a lot more. Right now on the breadboard, the short spikes might stay under the 30% mark, so they are not picked up. On the project board, they are pulled up completely, but very short, so sometimes they are not picked up, and sometimes they are?

Also what i am seeing is that the short spikes happen on a constant interval. They are spaced apart from each other 33.33khz.

When changing the clock to 100 * 1000U, the short spikes are higher, and spaced apart 10.3khz. WhatsApp Image 2021-04-25 at 14 55 38

with 1000 * 1000U they are spaced apart around 60khz and with 800 * 1000U spaced apart 33khz

So spikes table: 1000 1000U : 60khz 800 1000U : 33khz 400 1000U : 33.33khz 100 1000U : 10.3khz

I thought the spikes were part of the data signal, but if that is the case, then i would expect a more uniform change in frequency based on the i2c clock rate?

niek-dewit commented 3 years ago

i also measured the duration of the spike from when it starts rising until it drops: 100 1000U : 2200ns 400 1000U : 500ns 800 1000U : 500ns 1000 1000U: 220ns

and the duration of the entire data transfer: 100 1000U : 1.090ms 400 1000U : 0.338ms 800 1000U : 0.338ms 1000 1000U: 0.190ms

Seems like there is no difference between 400 and 800?

Running your code at 100 1000U already stopped running twice now, although i did not catch the exact moment when it exactly stopped running. Also running it at 1000 1000U, it is not able to sustain it for a long time, but that is expected because it looks like the signals just barely have enough time to rise without pullups.

niek-dewit commented 3 years ago

You could try to catch errors on the slave and then reset it. e.g. you could put something like this in loop().

    if (listener.has_error()) {
        Serial.println("Slave I2C error. Resetting.");
        listener.listen(listener_address);
    }

I will try this later, thank you!

niek-dewit commented 3 years ago

Having said all that, I've never seen the I2C link just stop before. It's almost like the SDA line is being disconnected. Is it possible that there's a loose connection somewhere? also, i dont think it is a connection issue. In my breadboard setup. when i unplug any of the i2c wires, master will throw error 10. When i plug it back in, it will send data successfully again. I can do this with either or both of the wires at the same time and it will continue like normal. however when touching the wires and wiggling them a bit, that is when the master sometimes randomly starts hanging at ...bytes sent... . And at that point, i can replug the wires all i want, but nothing changes until i power cycle

niek-dewit commented 3 years ago

running your example code on a breadboard with 100 * 1000U, it failed again like in the video i sent earlier, without touching the wires. Can you reproduce?

Richard-Gemmell commented 3 years ago

I haven't seen it fail on my breadboard.

I'll try wiggling the wires in the way you mentioned and see if I can get it to stick in the way you described. If I can reproduce that then there may be something I can to change the driver to make it recover.

Are you using the same Teensy boards in your two setups? I'm wondering if one of your boards is faulty.

Richard-Gemmell commented 3 years ago

I think I've reproduced the problem with it getting stuck. I'll investigate more later today.

niek-dewit commented 3 years ago

I am using the same 2 teensy boards with the 2 setups. Although it could definitely be that 1 of them is faulty in some way. The first one was also the first microcontroller I used when getting into electronics, so it has experienced some abuse.

But awesome to hear that you might have reproduced the issue. Let me know if I can be of any help :)

Richard-Gemmell commented 3 years ago

I can definitely reproduce the issue. :)

I've been investigating what's happening from the master's point of view. I believe that spikes and/or dropouts on the SDA line cause the master to think that another master device has taken over the bus. The I2C allows for more than master. They're supposed to use an "arbitration" process to work out who's in control. In this case, the master sits there waiting for the other mater to release the bus but that never happens.

I'll go and see what it looks like from the slave's point of view.

The fundamental problem is that the electrical connection isn't reliable enough. (I suspect it's just the SDA line.) If you can fix that then this will stop happening. I'll carry on looking for a way to reset it though.

I suggest that you switch to using the other I2C port. (Pins 17 and 16). If that resolves the problem then the cause was probably a dodgy pin or track on one of your Teensys. If you're still getting similar traces on SDA then it's a wiring or noise issue.

niek-dewit commented 3 years ago

I just tried switching to pins 16 & 17, with changes in the code to use Master1 and Slave1. Signals on the scope look exactly the same. Also tried switching which of the teensy's is the slave and master. But no difference.

I honestly do not thing this is a wiring or noise issue. Since the scope shows the spikes when I use the breadboard setup with those prototype wires + usb power supply, and it also shows the spikes when using the teensy's in a pcb + meanwell power supply.

Did you have to implement the i2c logic yourself in this libary, or is this library "simply" talking to some hardware api to make the i2c magic happen? If this regular spike is really not just part of the expected i2c signal, then I think this is something going wrong in the code, since the duration of the spike also changes based on the i2c clock speed https://github.com/Richard-Gemmell/teensy4_i2c/issues/15#issuecomment-826322299. (which honestly also makes me suspect that it is supposed to be there?)

Richard-Gemmell commented 3 years ago

I've figured out what's going on from the slave's point of view. The fake signals on SDA make the slave think that a master has asked it to transmit data to the master. It doesn't have anything to send so it reports error code 3 - buffer_underflow and waits. I think it's waiting for the master to acknowledge a byte. The phantom master can't acknowledge so the slave waits for ever.

I have a workaround which is to call listener.listen(listener_address); whenever the slave detects a buffer underflow. Put the following in your slave's loop().

if (listener.has_error()) {
    error_count++;
    I2CError error = listener.error();
    if (error == I2CError::buffer_underflow) {
        Serial.println("ERROR: Buffer underflow detected. Resetting.");
        listener.listen(listener_address);
    } else {
        Serial.print("Listener failed with error code ");
        Serial.println((int)error);
    }
}

(Note that this might be a bad idea if you really do want the master to request data from the slave.)

When I added this to my breadboard setup it made it very difficult to get the bus to lock up. I did manage it once though.

niek-dewit commented 3 years ago

Awesome man! I will try this out next Tuesday when I continue on this project again. Hopefully it solves my issues, thanks a lot for your time and effort so far! :)

Richard-Gemmell commented 3 years ago

Thanks for trying out the other bus etc. I agree with your point about the spikes scaling with the clock speed. That does suggest they're part of the protocol and intentional. I need to get a scope out to see what they look like on my setup. My suspicion is that they should be a lot wider and cleaner.

The thing that makes me think there's an electrical problem is that:

If your Teensys and your breadboard are Ok then I'd expect you to be able to run these tests all day without seeing any errors.

The Teensy has hardware support for I2C. The library tells the hardware that it wants to send or receive bytes and registers an interrupt routine to handle the ensuing events. Weirdly this makes it considerably harder to implement a driver than simply bit-banging! As you suspect, the hardware determines when pulses are sent. It's responsible for detecting edges as well. The pulse widths are mostly determined by the hardware but there is some configuration.

Unfortunately, the documentation for the hardware is very poor; especially the information about configuring timings. It's quite possible that I've got something wrong. It's more likely that the default configuration is sub-optimal for your setup in some way.

niek-dewit commented 3 years ago

Where would I be able to find the hardware documentation about the timings, and where the code of this library is the default configuration applied? Maybe if I have some spare time somewhere I could read into this as well.

Richard-Gemmell commented 3 years ago

The data sheets are listed at on this site. The I2C section is in Chapter 47 - "Low Power Inter-Integrated Circuit (LPI2C)".

Everything that matters about the configuration is in imx_rt1060_i2c_driver.cpp. Look for PAD_CONTROL_CONFIG on line on line 69. See also the various IMX_RT1060_I2CBase::Config entries on lines 598 onwards.

There's one issue I know about that might be relevant. The spec says somewhere that the slave should send NACK when a buffer underflow occurs. I couldn't find any way to send a NACK as ACK and NACK are handled automatically by the hardware layer. See the TODO on line 549.

Richard-Gemmell commented 3 years ago

I probably won't have time to look into this any more until next weekend. Please do let me know how it's going though.

Richard-Gemmell commented 3 years ago

I didn't have an oscilloscope before I wrote the driver so I didn't look at any signals. It would be really useful to be able to create some sample I2C transactions, work out what the SDA waveforms should look like and then compare them to what we're seeing. We can then spot problems.

It would be incredibly useful if you could help with this.

I found an I2C simulator that shows expected pulse trains. See https://labjack.com/content/i2c-simulator. There may be better simulators out there too.

The I2C spec defines how the pulses should look. This site talks about pulse shapes a lot.

I'm thinking of putting some of these on GitHub to help other I2C users work out whether they're looking at electrical or logic problems.

niek-dewit commented 3 years ago

Tomorrow I will look at all the information you shared with me and the i2c simulator to work out what a certain data transfer signal should theoretically look like. Then I will run the same data through the actual teensy's and measure the signal on my scope, I'll share the results here.

niek-dewit commented 3 years ago

Ok so I googled a bit more and read a bit of the documentation. The documentation is very hard for me to understand, I don't understand much of the lingo yet :P

But I simulated the 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, test message were trying to send, in that simulator you linked earlier: image

(I didn't do new measurements on my scope, just reusing earlier pictures here)

Except from the regular spikes, the signal looks exactly the same. So I think the signal is good, but how do we explain these spikes? image image

As you can see in the simulated run, 8 bits are always followed by an ACK. So 9 clock pulses for every byte transmitted. If you look at the measured signal on my scope. You can see that the "random" spike always happens right after the 9th clock pulse. So right after the ACK. Looking at the other pictures, this seems to be consistent. So a bit of googling later, I found some people that said this is allowed according to the spec. https://e2e.ti.com/support/processors/f/processors-forum/186350/the-glitch-occurs-on-i2c0-sda-of-dm8168

This is not a glitch per se. It is simple the bus being released at the end of the cycle and is allowed as per the I2C spec as long as it does not occur whilst CLK is high.

It is difficult to see from the capture, but this might simply be the ACK period, where it would be the responsibility of the slave device to hold data low for the ACK cycle. If there is no slave device responding to the message then it would also cause SDA to go high.

So I think the spikes are okay. And we probably don't have to investigate this any further?

I think next steps would be to take a look at articles such as: https://www.i2c-bus.org/i2c-primer/analysing-obscure-problems/blocked-bus/ https://www.pebblebay.com/i2c-lock-up-prevention-and-recovery/ https://www.nxp.com/docs/en/application-note/AN4803.pdf https://electronics.stackexchange.com/questions/350173/what-are-the-various-ways-in-which-an-i2c-bus-may-hang https://electronics.stackexchange.com/questions/469547/i2c-bus-getting-locked-up-on-single-master-single-slave-system

It seems like "i2c blocked bus" is a pretty common thing. And (I didn't read much of the articles, but) I think there are some hacky ways to detect and fix a blocked bus.

Richard-Gemmell commented 3 years ago

It's good to know that the protocol looks correct. I didn't know you could get short spikes related to the ACK. Thanks for that.

I was aware that the bus could get stuck but I've never looked into the reasons. The issue I reproduced is a stuck bus. The cause is that the slave thinks the master started a read transfer but it didn't. The fix works because it resets the slave which gives the master a chance to try again. It'll be interesting to work out which other states can deadlock the bus.

If you're also seeing the slave reporting buffer_underflow errors then the slave must think it's been asked to transmit data to the master. We know your master never requests data so this shouldn't happen. Perhaps the master tried to send data to the slave but the read/write bit got flipped. i.e. the master set it to write (0) but the slave read read (1).

If bits are getting flipped on the bus then it would show up in the data. Perhaps you could run a test to see if this happens and how often. A 4 byte message consisting of [AA, FF, 55, 00] would give a nice mix of bits. I'd do at least 100,000 messages to get good stats. You can just send the messages as fast as possible.

Richard-Gemmell commented 3 years ago

If you aren't seeing data errors but the bus is still getting stuck then lets see if we can detect the error state and clear it by restarting the drivers. If all else fails I can probably write something to enable the master to force the slave off the bus. I'm not very keen to do that as I don't think I can do it with the I2C driver so it'll involve bit-banging.

niek-dewit commented 3 years ago

Alright I did some testing and measurements, and I will share my observations here.

First, I made a little overview for myself to understand the signal better: (this is when sending [AA, FF, 55, 00]) image

I changed the code a bit so that a lot of small messages are sent to the slave in succession. 1 transfer per millisecond. This looks like this on the scope. image

In the code, I added some very crudely programmed error logging. Slave code:

#include <Arduino.h>
#include <i2c_driver.h>
#include "imx_rt1060/imx_rt1060_i2c_driver.h"

// Blink the LED to make sure the Teensy hasn't hung
volatile bool led_high = false;
void blink_led();

const uint8_t listener_address = 127;

I2CSlave& listener = Slave;
const size_t listening_buffer_size = 513;
uint8_t listening_buffer[listening_buffer_size] = {};
uint8_t listening_buffer_2[listening_buffer_size] = {};
volatile size_t listener_bytes_received = 0;
void after_receive(size_t, uint16_t);

elapsedMillis timeMillis = 0;

unsigned long correct = 0;
unsigned long incorrect = 0;

unsigned long noError = 0;
unsigned long error0 = 0;
unsigned long error1 = 0;
unsigned long error2 = 0;
unsigned long error3 = 0;
unsigned long error4 = 0;
unsigned long error5 = 0;
unsigned long error6 = 0;
unsigned long error7 = 0;
unsigned long error8 = 0;
unsigned long error9 = 0;
unsigned long error10 = 0;
unsigned long error11 = 0;

void setup() {
    // Turn the LED on
    pinMode(LED_BUILTIN, OUTPUT);
    digitalWrite(LED_BUILTIN, true);

    Serial.begin(9600);
    Serial.println("Started listening");
    listener.after_receive(after_receive);
    listener.set_receive_buffer(listening_buffer, listening_buffer_size);
    listener.listen(listener_address);
    analogWrite(0, 1);
    delay(2000);

}

void loop() {
    if (listener_bytes_received) {
        if(listening_buffer[0] == 0xAA && listening_buffer[1] == 0xFF && listening_buffer[2] == 0x55 && listening_buffer[3] == 0x00 ) {
          correct++;
        } else {
          incorrect++;
        }
        listener_bytes_received = 0;

        if (listener.has_error()) {

            switch((int)listener.error()) {
                case 0: {
                    error0++;
                    break;
                }
                case 1: {
                    error1++;
                    break;
                }
                case 2: {
                    error2++;
                    break;
                }
                case 3: {
                    error3++;
                    listener.listen(listener_address);
                    break;
                }
                case 4: {
                    error4++;
                    break;
                }
                case 5: {
                    error5++;
                    break;
                }
                case 6: {
                    error6++;
                    break;
                }
                case 7: {
                    error7++;
                    break;
                }
                case 8: {
                    error8++;
                    break;
                }
                case 9: {
                    error9++;
                    break;
                }
                case 10: {
                    error10++;
                    break;
                }
                case 11: {
                    error11++;
                    break;
                }
            }
        } else {
            noError++;
        }
        Serial.print("noError: "); Serial.println(noError);
        Serial.print("error0: "); Serial.println(error0);
        Serial.print("error1: "); Serial.println(error1);
        Serial.print("error2: "); Serial.println(error2);
        Serial.print("error3: "); Serial.println(error3);
        Serial.print("error4: "); Serial.println(error4);
        Serial.print("error5: "); Serial.println(error5);
        Serial.print("error6: "); Serial.println(error6);
        Serial.print("error7: "); Serial.println(error7);
        Serial.print("error8: "); Serial.println(error8);
        Serial.print("error9: "); Serial.println(error9);
        Serial.print("error10: "); Serial.println(error10);
        Serial.print("error11: "); Serial.println(error11);
        Serial.print("Correct: "); Serial.println(correct);
        Serial.print("Incorrect: "); Serial.println(incorrect);
    }

    if(timeMillis > 500) {
        blink_led();
        timeMillis = 0;
    }
    delay(1);

}

void after_receive(size_t length, uint16_t address) {
    if (!listener_bytes_received) {
        memcpy(listening_buffer_2, listening_buffer, length);
        listener_bytes_received = length;
    }
}

void blink_led() {
    led_high = !led_high;
    digitalWrite(LED_BUILTIN, led_high);
}

Master code:

#include <Arduino.h>
#include <i2c_device.h>

// Blink the LED to make sure the Teensy hasn't hung
volatile bool led_high = false;
void blink_led();

I2CMaster& master = Master;

const uint8_t listener_address = 127;
uint8_t buff[513] = {0xAA, 0xFF, 0x55, 0x00};
unsigned int speaking_length = 4;

unsigned long noError = 0;
unsigned long error0 = 0;
unsigned long error1 = 0;
unsigned long error2 = 0;
unsigned long error3 = 0;
unsigned long error4 = 0;
unsigned long error5 = 0;
unsigned long error6 = 0;
unsigned long error7 = 0;
unsigned long error8 = 0;
unsigned long error9 = 0;
unsigned long error10 = 0;
unsigned long error11 = 0;

elapsedMillis timeMillis = 0;
void setup() {
    // Turn the LED on
    pinMode(LED_BUILTIN, OUTPUT);
    digitalWrite(LED_BUILTIN, true);

    master.begin(100 * 1000U);
    Serial.begin(9600);
    delay(2000);
}

void loop() {
    bool finished = master.finished();
    if (finished)
    {
        if (master.has_error()) {
            switch((int)master.error()) {
              case 0: {
                error0++;
                break;
              }
              case 1: {
                error1++;
                break;
              }
              case 2: {
                error2++;
                break;
              }
              case 3: {
                error3++;
                break;
              }
              case 4: {
                error4++;
                break;
              }
              case 5: {
                error5++;
                break;
              }
              case 6: {
                error6++;
                break;
              }
              case 7: {
                error7++;
                break;
              }
              case 8: {
                error8++;
                break;
              }
              case 9: {
                error9++;
                break;
              }
              case 10: {
                error10++;
                break;
              }
              case 11: {
                error11++;
                break;
              }
            }
        } else {
            noError++;
        }
        Serial.print("noError: "); Serial.println(noError);
        Serial.print("error0: "); Serial.println(error0);
        Serial.print("error1: "); Serial.println(error1);
        Serial.print("error2: "); Serial.println(error2);
        Serial.print("error3: "); Serial.println(error3);
        Serial.print("error4: "); Serial.println(error4);
        Serial.print("error5: "); Serial.println(error5);
        Serial.print("error6: "); Serial.println(error6);
        Serial.print("error7: "); Serial.println(error7);
        Serial.print("error8: "); Serial.println(error8);
        Serial.print("error9: "); Serial.println(error9);
        Serial.print("error10: "); Serial.println(error10);
        Serial.print("error11: "); Serial.println(error11);

        // Send the next message
        master.write_async(listener_address, buff, speaking_length, true);

    } else {
        //
    }

    if(timeMillis > 500) {
        blink_led();
        timeMillis = 0;
    }
    delay(1);
}

void blink_led() {
    led_high = !led_high;
    digitalWrite(LED_BUILTIN, led_high);
}

On the breadboard, everything works perfectly, no lockups when I don't touch the wires: image

Now I am moving to my custom pcb again. It locks up way more often. With scope + external cable connected to i2c lines, some screenshots of the resulting logs after the bus locked up: (left is master, right is slave) image So these are the logs with scope + external cable connected to i2c lines.

Even though the amount of successful transfers is increased a lot when only the 2 teensy's are plugged into my pcb and without any external devices,.. the error/lock behavior is the same.

When I see the data coming in in the logs, I often see the master log error10, and then the slave often immediately logs an incorrect result.

Sometimes no errors or incorrect data are reported in any of the logs, but the bus still locks up. Sometimes master logs error10, while the slave doesn't log anything going wrong, but the bus still locks up.

Sometimes the slave logs error3, but not as often as error10 is logged by the master. But it doesnt seem like this error locks up the bus. The fix to call listener.listen(listener_address); seems to work for this scenario. I saw the slave log error11 once. At the same time, the master logged error10 6 times.

When the bus is stuck, I can see on the scope that SCL is always LOW, and SDA remains always HIGH.

It doesnt matter if there are on average 10 successful transfers, or on average 100000 successful transfers, only shortly after a single error10, the bus gets locked up. So I suspect that sometimes error10 happens, but everything gets locked up so quickly that there is not even enough time to detect/log the error?

If you aren't seeing data errors but the bus is still getting stuck then lets see if we can detect the error state and clear it by restarting the drivers.

I think this is what I am seeing. What could I try to more accurately detect the error state?

Richard-Gemmell commented 3 years ago

I love the annotated trace at the top of that post. That makes it really clear what the data is doing.

I'm trying to work out why the bus doesn't clear. Please can you confirm that when the bus is stuck then SCL (the clock line) is LOW and SDA is HIGH? (I'm kind of expecting the opposite.)

niek-dewit commented 3 years ago

Ah yes sorry for causing some confusion! When the bus is locked up, SCL stays HIGH and SDA stays LOW. You were correct with expecting the opposite, I was mistaken in my previous post.

Richard-Gemmell commented 3 years ago

That's good to hear. I've been wracking my brain trying to work out how SCL could be stuck low. :)

So I think this is a classic case of a stuck I2C bus as you suggested.

I've been trying to figure out how the master can detect the issue. I think there's a bug in the driver which means the master simultaneously thinks it's both in and not in a transaction. It fails to report an error and refuses to send any more messages. I'll try and get out a fix soon.

Richard-Gemmell commented 3 years ago

(This change won't allow you to reset the bus but it's a step in the right direction.)

niek-dewit commented 3 years ago

Okay! So if I understand correctly, with this fix, we would be able to listen for some sort of "stuck bus" error. And then based on that I could figure out how to get it running again by restarting the master or something?

Richard-Gemmell commented 3 years ago

I've made an experimental change to the master. It's on the stuck_bus branch. (Let me know if you need any help with Git and branches.)

This change introduces error 12 - bus_busy. The master should refuse to start a new transaction and set this error if the bus is stuck when you call master.write_async(). This means that the master should now recover if the slave resets itself. (Previously the master was getting itself into an unrecoverable state.) You don't to need reset the master.

Please add error 12 to your error logging. I'd expect you to see this repeatedly when the slave gets stuck after the master reports an error 10. If you don't see it then my idea hasn't worked!

Now you need to reset the slave if it gets stuck. Unfortunately the slave doesn't have a mechanism to tell you it's stuck. I suggest that you just measure the time since the last successful message using elapsedMillis. If too much time has passed then call listener.listen(listener_address); to reset the slave.

I don't get the stuck slave problem so I can't be sure that this'll work. Please give it a go and let me know what happens. I'm particularly interested to know if you ever see error 12 and if you ever see successful messages after seeing error 12. (It's possible that this fixes some but not all of the problems.)

Richard-Gemmell commented 3 years ago

Can you could record a trace which shows what happened just before the bus got stuck? I'm curious to know what triggered it.

I'm a bit hampered by not getting the same errors. It would be really helpful if you could create a breadboard layout which displays the same behaviour. I realise that this might not even be possible. If it works it'll make it a lot easier for me to investigate.

I've got lots of ideas about this problem but it's going to take a lot of time to try them out. Some of these might reduce the frequency of stuck buses. Others might make it easier or neater to recover the stuck bus. It's probably going to take a few weeks or a couple of months for me to dig through all the options though. I'm hoping you'll get a solution which is "good enough" much sooner than that though.

niek-dewit commented 3 years ago

I fetched the changes and updated the test code in slave & master. Also inside of the source code, I uncommented #define DEBUG_I2C on line 7 of imx_rt1060_i2c_driver.cpp in the hopes of getting some more info.

Slave code:

#include <Arduino.h>
#include <i2c_driver.h>
#include "imx_rt1060/imx_rt1060_i2c_driver.h"

// Blink the LED to make sure the Teensy hasn't hung
volatile bool led_high = false;
void blink_led();

const uint8_t listener_address = 127;

I2CSlave& listener = Slave;
const size_t listening_buffer_size = 513;
uint8_t listening_buffer[listening_buffer_size] = {};
uint8_t listening_buffer_2[listening_buffer_size] = {};
volatile size_t listener_bytes_received = 0;
void after_receive(size_t, uint16_t);

elapsedMillis timeMillis = 0;

unsigned long correct = 0;
unsigned long incorrect = 0;

unsigned long noError = 0;
unsigned long error0 = 0;
unsigned long error1 = 0;
unsigned long error2 = 0;
unsigned long error3 = 0;
unsigned long error4 = 0;
unsigned long error5 = 0;
unsigned long error6 = 0;
unsigned long error7 = 0;
unsigned long error8 = 0;
unsigned long error9 = 0;
unsigned long error10 = 0;
unsigned long error11 = 0;
unsigned long error12 = 0;

void setup() {
    // Turn the LED on
    pinMode(LED_BUILTIN, OUTPUT);
    digitalWrite(LED_BUILTIN, true);

    Serial.begin(9600);
    Serial.println("Started listening");
    listener.after_receive(after_receive);
    listener.set_receive_buffer(listening_buffer, listening_buffer_size);
    listener.listen(listener_address);
    analogWrite(0, 1);
    delay(2000);

}

void loop() {
    if (listener_bytes_received) {
        if(listening_buffer[0] == 0xAA && listening_buffer[1] == 0xFF && listening_buffer[2] == 0x55 && listening_buffer[3] == 0x00 ) {
          correct++;
        } else {
          incorrect++;
        }
        listener_bytes_received = 0;

        if (listener.has_error()) {

            switch((int)listener.error()) {
                case 0: {
                    error0++;
                    break;
                }
                case 1: {
                    error1++;
                    break;
                }
                case 2: {
                    error2++;
                    break;
                }
                case 3: {
                    error3++;
                    listener.listen(listener_address);
                    break;
                }
                case 4: {
                    error4++;
                    break;
                }
                case 5: {
                    error5++;
                    break;
                }
                case 6: {
                    error6++;
                    break;
                }
                case 7: {
                    error7++;
                    break;
                }
                case 8: {
                    error8++;
                    break;
                }
                case 9: {
                    error9++;
                    break;
                }
                case 10: {
                    error10++;
                    break;
                }
                case 11: {
                    error11++;
                    break;
                }
                case 12: {
                    error12++;
                    break;
                }
            }
        } else {
            noError++;
        }
        Serial.print("noError: "); Serial.println(noError);
        Serial.print("error0: "); Serial.println(error0);
        Serial.print("error1: "); Serial.println(error1);
        Serial.print("error2: "); Serial.println(error2);
        Serial.print("error3: "); Serial.println(error3);
        Serial.print("error4: "); Serial.println(error4);
        Serial.print("error5: "); Serial.println(error5);
        Serial.print("error6: "); Serial.println(error6);
        Serial.print("error7: "); Serial.println(error7);
        Serial.print("error8: "); Serial.println(error8);
        Serial.print("error9: "); Serial.println(error9);
        Serial.print("error10: "); Serial.println(error10);
        Serial.print("error11: "); Serial.println(error11);
        Serial.print("error12: "); Serial.println(error12);
        Serial.print("Correct: "); Serial.println(correct);
        Serial.print("Incorrect: "); Serial.println(incorrect);
    }

    if(timeMillis > 500) {
        blink_led();
        timeMillis = 0;
    }
    delay(1);

}

void after_receive(size_t length, uint16_t address) {
    if (!listener_bytes_received) {
        memcpy(listening_buffer_2, listening_buffer, length);
        listener_bytes_received = length;
    }
}

void blink_led() {
    led_high = !led_high;
    digitalWrite(LED_BUILTIN, led_high);
}

Master code:

#include <Arduino.h>
#include <i2c_device.h>

// Blink the LED to make sure the Teensy hasn't hung
volatile bool led_high = false;
void blink_led();

I2CMaster& master = Master;

const uint8_t listener_address = 127;
uint8_t buff[513] = {0xAA, 0xFF, 0x55, 0x00};
unsigned int speaking_length = 4;

unsigned long noError = 0;
unsigned long error0 = 0;
unsigned long error1 = 0;
unsigned long error2 = 0;
unsigned long error3 = 0;
unsigned long error4 = 0;
unsigned long error5 = 0;
unsigned long error6 = 0;
unsigned long error7 = 0;
unsigned long error8 = 0;
unsigned long error9 = 0;
unsigned long error10 = 0;
unsigned long error11 = 0;
unsigned long error12 = 0;

elapsedMillis timeMillis = 0;
void setup() {
    // Turn the LED on
    pinMode(LED_BUILTIN, OUTPUT);
    digitalWrite(LED_BUILTIN, true);

    master.begin(100 * 1000U);
    Serial.begin(9600);
    delay(2000);
}

void loop() {
    bool finished = master.finished();
    if (finished)
    {
        if (master.has_error()) {
            switch((int)master.error()) {
              case 0: {
                error0++;
                break;
              }
              case 1: {
                error1++;
                break;
              }
              case 2: {
                error2++;
                break;
              }
              case 3: {
                error3++;
                break;
              }
              case 4: {
                error4++;
                break;
              }
              case 5: {
                error5++;
                break;
              }
              case 6: {
                error6++;
                break;
              }
              case 7: {
                error7++;
                break;
              }
              case 8: {
                error8++;
                break;
              }
              case 9: {
                error9++;
                break;
              }
              case 10: {
                error10++;
                break;
              }
              case 11: {
                error11++;
                break;
              }
              case 12: {
                error12++;
                break;
              }
            }
        } else {
            noError++;
        }
        Serial.print("noError: "); Serial.println(noError);
        Serial.print("error0: "); Serial.println(error0);
        Serial.print("error1: "); Serial.println(error1);
        Serial.print("error2: "); Serial.println(error2);
        Serial.print("error3: "); Serial.println(error3);
        Serial.print("error4: "); Serial.println(error4);
        Serial.print("error5: "); Serial.println(error5);
        Serial.print("error6: "); Serial.println(error6);
        Serial.print("error7: "); Serial.println(error7);
        Serial.print("error8: "); Serial.println(error8);
        Serial.print("error9: "); Serial.println(error9);
        Serial.print("error10: "); Serial.println(error10);
        Serial.print("error11: "); Serial.println(error11);
        Serial.print("error12: "); Serial.println(error12);

        // Send the next message
        master.write_async(listener_address, buff, speaking_length, true);

    } else {
        //
    }

    if(timeMillis > 500) {
        blink_led();
        timeMillis = 0;
    }
    delay(1);
}

void blink_led() {
    led_high = !led_high;
    digitalWrite(LED_BUILTIN, led_high);
}

I ran the tests again, and to my surprise, I actually didn't see any error 12 getting logged at any point during the tests. I did see new errors that were logged by master because of the uncommented debug flag though. Maybe they give you some new clue?

Master: Arbitration lost
Master: abort_transaction
MSR Flags: BBF 

I have not read the entire instruction manual yet of my scope, so I am not really good with the controls yet. But I did manage to record some traces where I caught the last transfer before it got stuck. To catch this on my scope, I just set the "trigger" to trigger on a falling slope, and then kept re-running the test until the bus got locked fast enough that everything happened in 1 frame on my scope so that I could zoom in on it. I am sure there are better ways to do this. But I cant be bothered to figure it out right now ;)

In any case, I got some traces. So because of my weird method of gathering the traces, I will only be able to show you traces of when the bus got stuck VERY QUICKLY (with less than 15 successful transfers), however the test normally reaches around 100-200 successful transfers before it locks up.

Here are some of my observations: image

And also here are some logs of when the bus still got locked up, but there were too many transfers for my scope to pick up the last one: image

niek-dewit commented 3 years ago

Here I quickly put one of the "last transfer close-ups" traces on top of the diagram I made earlier so that it is easy to compare what is going on. (sorry the quality is a bit shit because of the way I exported it from the scope this time :P ) image

Richard-Gemmell commented 3 years ago

Hi Niek,

Thanks for the lovely traces. I'll check them out later.

Enabling DEBUG_I2C was a good idea. I suggest you keep enabling it while we're debugging.

The master reports "arbitration lost" when it thinks another master has taken over the bus. According to the iMX data sheet it either detected an unexpected START or STOP or it "was transmitting data but received different data that the data that was transmitted". I think this means that it set SDA to one value but saw that it had the opposite value.

I found another bug in the driver where the master didn't return to the idle state after aborting a transaction due to an "arbitration lost" error. The bus was Ok but the master wouldn't send another message. :(

I've added some debug to finished() to try to detect these errors automatically in the future. See the "WARNING: Inconsistent state in finished()." warning on line 146. If you see that then there's probably another bug in the driver.

Please give the new version a try. You might start seeing bus_busy errors.