adafruit / circuitpython

CircuitPython - a Python implementation for teaching coding with microcontrollers
https://circuitpython.org
Other
4.04k stars 1.2k forks source link

I2C transmit buffer issues #778

Closed ghost closed 6 years ago

ghost commented 6 years ago

It appears I2C can only send out about 64 bytes at a time, and this is not documented as far as I can tell. I finally scoped my I2C bus with a Saleae when transmitting to the CY8C9560A (which requires a packet about 146-bytes or more in length), and discovered that only the first 64 data bytes are sent by the Trinket M0.

Can some circuitpython developer confirm this, and maybe update the documentation to explain the limitation? If this is indeed a limitation, then it would be nice if .write() threw an exception explaining the buffer length limitation.

I'm using the Adafruit slim wrapper of Circuitpython's I2C (from adafruit_bus_device import i2c_device), but I looked at the source code for that module and there doesn't appear to be any length limitation due to that wrapper.

tannewt commented 6 years ago

What version of CircuitPython are you running?

ghost commented 6 years ago

sys.version = '3.4.0' sys.implementation = (name='circuitpython', version=(2, 2, 4))

tannewt commented 6 years ago

Thanks! Mind trying Alpha 6? There is new code for I2C in it.

ghost commented 6 years ago

Hmm...I usually prefer to stick with stable builds so I'm not dealing with platform issues. But, maybe I'll try upgrading and see if other things break. Is it not apparent from the Circuitpython C firmware what's going on? I also noticed that reads of > 64 bytes take a little pause at 64-byte boundaries, presumably to resize a buffer. Probably that wasn't implemented for transmit.

tannewt commented 6 years ago

We're hard at work finishing 3.x so I'd rather see if its an issue there. Its pretty stable already. We're just missing touchio and microcontroller.cpu.temperature.

ghost commented 6 years ago

Ok, so I decided to try it out. Actually upgrading / downgrading is pretty easy thanks to the drag-and-drop method you all created. It took a few minutes longer because I had to also rebuild mpy-cross...I'm compiling everything to .mpy files to reduce memory usage; I even had to rewrite the ssd1306 driver to reduce the memory usage because the Trinket M0 was running out of memory with Adafruit's graphics-oriented version.

Long story short, the I2C write limitation is still there in 3.0.0 alpha 6. I'm attaching two Saleae analyzer captures. The data is 149 bytes long (command, command value, 146 byte payload, 1 CRC byte). The read works fine, and I think the pause in the middle is actually due to the Cypress CY8C9560A crossing a 64-byte boundary and needing to fetch more data (the Trinket M0 ACKs immediately). The write, however, just zeros out after the first 64 bytes. I'm fairly sure this is not the slave device, because it ACKs on every byte, and actually it looks as if the SCL is not clocking after the first 64 bytes. If it was a master/slave contention issue, I would expect SDA to be pulled low, like a slave ACKing when it should not.

read write

As a further investigation, I tried splitting up the write in two different ways: 1) using start/end = 0/64, 64/128, 128/149 2) allocating 3 separate bytearrays with 64-byte chunks and sending them in sequence with stop=False/False/True

Neither of these work, and I still get the same result (zeroed data after 64 bytes and OSError: 5).

tannewt commented 6 years ago

Thank you so much for the confirmation! Will add it to the 3.0 milestone so we fix it before the stable release.

ghost commented 6 years ago

Ok thanks for looking into this. I'm also attaching the Saleae-decoded I2C traffic. I think it's possible this isn't even Circuitpython bug, but not really sure. When I count up the bytes more carefully, I see it is sending the device address, the register address, the register value, then 64 payload bytes before things get weird and it re-sends a connect-to-device packet. As far as the Circuitpython software knows, I think the register address, register value, and payload are all equivalent, so it seems to die after device address + 66 bytes of data. Does the buffer in Circuitpython happen to be 66 bytes long? That would be an important hint, I think.

decoded.txt

ghost commented 6 years ago

The Cypress datasheet actually comments "If current address crosses 64-byte block boundary, then device performs real writing to EEPROM", so another possibility is that it's a timeout issue in Circuitpython, and the I2C library gets screwed up waiting a long time for an ACK. The EEPROM programming on that part takes < 100ms at above-freezing temperatures, per the datasheet, and I see that the 64th data byte is ACK'd after waiting about 52.7ms, presumably it was writing the EEPROM during that time.

ghost commented 6 years ago

Sorry to spam you with updates. That Cypress part says "The I2C slave in this device requires that the I2C master supports clock stretching", which is basically that the slave holds SCL low while it performs slow operations like programming flash memory. I think this is what's happening in the 53ms when SCL is low and SDA is high, and it clearly screws up the Trinket M0.

In this case the ATSAMD21E18 "Slave SCL Low Extend Time-Out" should be disabled. That ATSAMD21E18 timeout occurs after 25-35ms, so clearly it should not be enabled if we need to talk to EEPROMs like that in the CY8C9560A that take 100-200ms to save data. I guess you (or I) need to set SEXTTOEN to 0 to disable the timeout.

Perhaps relatedly, the ATSAMD21E18 datasheet shows an errata entry which says

" 1 – The I2C Slave SCL Low Extend Time-out (CTRLA.SEXTTOEN) and Master SCL Low Extend Time-out (CTRLA.MEXTTOEN) cannot be used if SCL Low Time-out (CTRLA.LOWTOUT) is disabled. When SCTRLA.LOWTOUT=0, the GCLK_SERCOM_SLOW is not requested. Errata reference: 12003 Fix/Workaround: To use the Master or Slave SCL low extend time-outs, enable the SCL Low Time-out (CTRLA.LOWTOUT=1). "

tannewt commented 6 years ago

Interesting! Does CircuitPython throw an Exception once it times out? We should if we can't write the whole buffer.

ghost commented 6 years ago

Circuitpython throws an OSError: 5. The sequence of events is:

Can you disable that timeout bit? Or, perhaps point me in that direction and I can build my own copy of Circuitpython with that modification? I did a little searching and think it would be in the 'busio' area of the SAMD21 area of this repo. I'm not sure what a good generalized software interface would be...obviously every chip has many different configuration bits to customize the peripheral features. Since Circuitpython already sends bad data when it gets into this state, it probably warrants some kind of fix (and I, for selfish reasons, vote for supporting clock stretching!).

dhalbert commented 6 years ago

It would be in ports/atmel-samd/common-hal/busio/I2C.c. But then you'll be getting into the ASF4 library. If you want to try building it yourself, see this draft guide on building: https://learn.adafruit.com/building-circuitpython/introduction?preview_token=bWdsAiUq9L8RNRzWikZZ2g. We can also make a test build for you with that bit set. I'm out for a couple of hours but will check back in later.

ghost commented 6 years ago

Thanks Dan,

EDIT: what I originally wrote after here was wrong. It looks like the structs in sercom.h are just specifying the width of the config register fields, and then the registers are actually set in

//ports/atmel-samd/asf4{_conf}/samd{21|51}/hpl_sercom_config.h

I'm not sure what the difference is between "hpl" and "hri", but the helper functions and definitions are set in these files, it appears: //ports/atmel-samd/asf4/samd21/include/component/sercom.h //ports/atmel-samd/asf4/samd21/hri/hri_sercom_d21.h

So far, I haven't found any of the timeouts to be enabled (they default to off), so maybe this is a red herring.

dhalbert commented 6 years ago

hpl_sercom_config.h defines the default values for the I2C peripherals, so you are on the right track. As you say the current values for those bits are all 0.

hal, hpl, and hri are the successively lower-level API's provided by ASF4. We use the hal drivers when we can, but they are missing functionality and we often resort to hri. In the long run we may drop a lot of our use of ASF4, because it assumes fixed pin assignments and functionality, which is kind of the opposite of what we need for CircuitPython. It requires the use of Atmel START, which has its own issues for us. ASF3 (used in 2.x) is more flexible, but at the expense of larger code size (also it was a lot more mature).

ghost commented 6 years ago

Well, I tried setting the SCLSM bit high with some modifications to the asf4 submodule, but that broke everything and now the SCL clock just continues indefinitely after sending data. I'll have to spend some more time combing through the SAMD21 datasheet, since the I2C operating modes are fairly complex.

ghost commented 6 years ago

Ok, I solved it! The problem is actually not deep inside the bit-level functions; the problem is a software timeout in //ports/atmel-samd/asf4/samd{21|51}/hpl/sercom/hpl_sercom.c

In the function _sercom_i2c_sync_wait_bus, they are using:

uint32_t timeout = 65535;

However, this is not an accurate way to do a timeout since it is just software-decremented in a loop of unknown execution time. I tried out progressively larger values (8x larger was not enough, so I went to 64x larger), and the problem is now resolved.

The only change necessary is to change that line to:

uint32_t timeout = 4194303;
tannewt commented 6 years ago

Great find @paulswirhun ! Mind submitting a PR to our copy here: https://github.com/adafruit/asf4/tree/circuitpython

CircuitPython pulls that repo in as a submodule.

ladyada commented 6 years ago

@paulswirhun amazing! nice work debugging and finding :)

dhalbert commented 6 years ago

Very nice!

If we turn on the cache for SAMD51 (#783), the loop will take a shorter time, so it should count actual time instead of just counting up to a number. Do you have an idea of how many msecs it should be waiting?

ghost commented 6 years ago

@dhalbert: the Cypress part I'm using says it takes <200ms, so maybe it should be set to 250ms if you can time it accurately. However, if you're just going to include the hacky fix of increasing the software timeout value, it should be conservatively larger because the loop duration is not well controlled (it could change with library changes, compiler optimization flags, oscillator frequency, etc). If you decide to go with the software version, it probably warrants some kind of tick/tock measurement to see how that count value correlates to real time.

@tannewt: sounds like Dan might want a more thorough fix, so I'll leave it to you all to update; after all my change is only a few characters. (Also, although I use Git/Gerrit a lot at work, I haven't used Github so I have no idea how to submit a pull request).

@ladyada: Thanks! (what an honor :)

kattni commented 6 years ago

@paulswirhun I'd be happy to walk you through the steps to do a pull request. It's only a few more steps beyond git, especially since you're already familiar with git. While we may eventually want a more thorough fix, the change you've proposed would still be an excellent addition, and it would be a great opportunity for an easy change to learn how to do pull requests.

Are you on the Adafruit Discord? We have a CircuitPython channel where we chat all the time. Visit adafru.it/discord to join, if you'd like to. Then we'd also have the option of real-time chat to help you through the steps. :)

ghost commented 6 years ago

Sure, I've joined but haven't figured out how to push to asf4 for review (so far it seems needlessly complicated, like all things git).

dhalbert commented 6 years ago

Push the change to your own forked repo of asf4, in its own branch (a bugfix/enhancement branch), which you can delete later, after the PR is accepted. Then go to adafruit/asf4 in github and click "New pull request". If you have recently pushed, you will actually see a hint to do this. Set the branch to merge into to be adafruit:circuitpython. Then fill out the rest of the form and create the pull request.

ghost commented 6 years ago

Thanks, figured it out. I was trying to do it directly in the asf4 repo, but didn't have permissions I guess. I submitted it in my own fork now.

dhalbert commented 6 years ago

Great! We always create PR's in our forked repos even if we have the permissions. It's safer that way.

dhalbert commented 6 years ago

@paulswirhun Since you tried this change of increasing the count, we've turned on the instruction/data cache on SAMD51, which may make the timeout loop now run too fast again. Could you try it again with 3.0.0-beta.0 or the tip of master?

tannewt commented 6 years ago

Neither 3.0.0-beta.0 or master have the asf4 change so you'll need to build a custom version to test the increased number.

On Thu, May 24, 2018 at 3:00 PM Dan Halbert notifications@github.com wrote:

@paulswirhun https://github.com/paulswirhun Since you tried this change of increasing the count, we've turned on the instruction/data cache on SAMD51, which may make the timeout loop now run too fast again. Could you try it again with 3.0.0-beta.0 or the tip of master?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/adafruit/circuitpython/issues/778#issuecomment-391879675, or mute the thread https://github.com/notifications/unsubscribe-auth/AADNqYZxnrDzlNPrccQdHHLOg_V2LJbpks5t1y2JgaJpZM4TkPS5 .

ghost commented 6 years ago

I only have SAMD21 (Trinket M0).

On Thu, May 24, 2018, 3:03 PM Scott Shawcroft notifications@github.com wrote:

Neither 3.0.0-beta.0 or master have the asf4 change so you'll need to build a custom version to test the increased number.

On Thu, May 24, 2018 at 3:00 PM Dan Halbert notifications@github.com wrote:

@paulswirhun https://github.com/paulswirhun Since you tried this change of increasing the count, we've turned on the instruction/data cache on SAMD51, which may make the timeout loop now run too fast again. Could you try it again with 3.0.0-beta.0 or the tip of master?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/adafruit/circuitpython/issues/778#issuecomment-391879675 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AADNqYZxnrDzlNPrccQdHHLOg_V2LJbpks5t1y2JgaJpZM4TkPS5

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/adafruit/circuitpython/issues/778#issuecomment-391880249, or mute the thread https://github.com/notifications/unsubscribe-auth/AIRag_1RI-HalkfBsqbNLt5tKjYml1qFks5t1y4ogaJpZM4TkPS5 .

tannewt commented 6 years ago

3.0.0 is available on the Trinket M0 too.