espressif / esp-hosted

Hosted Solution (Linux/MCU) with ESP32 (Wi-Fi + BT + BLE)
Other
706 stars 169 forks source link

FG handshake pin behavior broken or the doc is wrong? #308

Closed JAndrassy closed 10 months ago

JAndrassy commented 10 months ago

the fg spi protocol doc has

Once this SPI transaction is submitted to SPI driver on ESP peripheral, Handshake pin is pulled high to indicate host that ESP peripheral is ready for transaction.

and

Host receives an interrupt through Handshake pin.

and

On completion of transaction, ESP peripheral pulls Handshake pin low.

but in the firmware register_hs_disable_pin binds the HS pin to slave select pin and nothing else happens with the HS pin. it always copies the slave select pin so it is always HIGH when checked.

mantriyogesh commented 10 months ago

Hello @JAndrassy ,

Explanation of pins: I will attach a doc which explains further these pins in short while. Basically, Handshake denotes the readiness of slave for next transaction. If handshake is high, then only the Linux (being master of spi bus), will start spi transaction (if either party has buffer to transfer). Idle state for handshake is high. Will be hold low, when slave is still populating buffer. Linux is programmed to identify this level change by positive edge trigger interrupt. Also additionally positive edge trigger for data ready.

As you had pointed: Handshake is additionally changed from high to low when CS is asserted. This change is a fix to rare race condition on bus.

Race condition: slave is controls the Handshake pin. But as Linux is so fast, by the time complete high to low Transition happens, Linux identify the state transition and immediately triggers next transaction. Due to this rare unexpected transaction, spi transaction starts at host, but slave starts in middle of the transaction, making some garbage buffer sent to host and garbage packet received from host.

To avoid this race condition , we get handshake low as soon as slave identify the CS is tradition from high to low (for active low CS). Basically motive to start the transaction when handshake is high(slave activeness for next transaction) is served. So no issues to get handshake down till transaction is complete and slave buffer is made ready for next transaction.

mantriyogesh commented 10 months ago

Low level for handshake is transient state , which is small time when slave is yet getting buffer from queue and pushing to spi device driver.

mantriyogesh commented 10 months ago

Similarly low is idle state for data ready. Is only high when slave has any new buffer to be sent to host.

Handshake is for synchronisation. Data ready is async, as host will also start spi transaction on data ready posedge trigger. The reason for data ready is indirect way to trigger the spi transaction (through host)

mantriyogesh commented 10 months ago

SPI_design_note.pdf

mantriyogesh commented 10 months ago

Nevertheless, you stand right, we should update the documentation for this new understanding.

mantriyogesh commented 10 months ago

I hope, it is clear now. Do let us know should you have any doubts.

JAndrassy commented 10 months ago

I use with an MCU. On logic analyzer I only see the handshake pin exactly copy the state of the CS pin. doesn't the binding to CS pin suppress any other change state of the pin?

JAndrassy commented 10 months ago

ok. I commented out the register_hs_disable_pin in spi_slave_api.c and now I see short LOW pulses on handshake, but I see that they are on a position where they would merge with the CS driven LOW

hosted-fg-hs-cs

hosted-fg-hs-not-cs

JAndrassy commented 10 months ago

the driver I use was written by Arduino. it evaluates only the ready pin. and the esp-hosted fw always replays with a dummy message (FF and zeros) first. then the ready pin is still HIGH and the next transaction returns the message. but it doesn't look like there is a better option to do it when the HS pin goes low with the CS pin.

on the picture the short pulse in the 4th row is the FF. and it is because HS was not yet LOW. the picture is from the version without register_hs_disable_pin and the driver doesn't use HS.

hosted-fg-hs-ignored

mantriyogesh commented 10 months ago

the driver I use was written by Arduino. it evaluates only the ready pin

I'm not aware of any such implementation. The first slave to host transaction may be dummy data.

Why? Slave always have to make sure that it is ready for any upcoming transaction. So when there is no tx data, slave has to populate at least dummy data. If the frequency of data is less, you will always see the first transaction is dummy and next immediately next transaction carries data and then data ready becomes low ( if slave still doesn't have any pending data).

But, if you run throughout etc, you will see the slave always have populated buffer and no dummy buffer. So for short while it is not problem, considering the positive side that bus becomes idle when there is no data either side.

JAndrassy commented 10 months ago

sorry, I don't complain about the Arduino driver here. I just explain what I have.

it is working, but I wasn't sure if it is as intended. thank you for the explanation.

mantriyogesh commented 10 months ago

If you can point me the implementation, I can take a look and compare.

Actually, while evolving we had similar very old design what you described . Then at the same time you had mentioned the hs disable pin, which is latest optimisation. But that makes me confused and curious at the same time, if anything better, we can definitely have a look

But anyway, the current master design is extremely stable and evolved from many issues. Our design considerations include lower power use by idling spi, whenever not required. And fastest possible full duplex transfers with stable network performance.

JAndrassy commented 10 months ago

it has a form of Arduino library, but it uses Renesas SDK so it runs only on specific Arduinos: https://github.com/arduino/ArduinoCore-renesas/tree/main/libraries/ESPhost/src

my version which is a clean Arduino version should run on any ARM Arduino. I tested with PI Pico. https://github.com/JAndrassy/ESPHost

there is a test sketch in https://github.com/JAndrassy/ESPHost/blob/main/examples/ESP32_TEST/ESP32_TEST.ino

mantriyogesh commented 10 months ago

Sorry forgot to mention, you have these waveforms captured. Can you send me raw format? And possibly software to open, it will be easier to look and understand what you are saying.

JAndrassy commented 10 months ago

the software is Saleae Logic https://www.saleae.com/downloads/ I use it with a cheap no-name analyzer hardware.

The captured signals are attached in zip hosted-fg-hs-cs.sal is for unmodified fw hosted-fg-hs-ignored.sal is for fw with register_hs_disable_pin commented out in esp_spi_init in spi_slave_api.c

hosted-fg-sal.zip

20231231_232648

mantriyogesh commented 10 months ago

https://github.com/espressif/esp-hosted/issues/308#issuecomment-1874059336 I didn't go through the full code yet, may take some time to understand fully. Nevertheless, I had checked your waveforms.

hosted-fg-hs-cs.sal

The SPI Clock is not stable from the waveforms you had shared.

Possibility 1) I am not sure if you had used sufficiently high sampling frequency or not? Possibility 2) If the sampling is correct, You might need to work on getting the timings correct.

Are the SPI modes used at both place, ESP and Host the same? Need to check if MOSI and MISO transitions are abiding the clock boundaries, if not, which will end up in incorrect parsing of messages. Also, incorrect message type and/or len understanding may lead to corruption or assert due to low memory. In ESP-Hosted, we do filter such rubbish/garbage packets and print them, to let user know something is not right at either side. I did not come across in repo you pointed, if you wish you can add additionally for safeguard.

hosted-fg-hs-ignored.sal

Very similar comments. In general your lags are high in handshake high and clock low, I did not see the race condition issue in your both waveforms. But to repeat myself, this is rare scenario and not generally occurring. because of race condition, packets get occasionally drop/ignored in background leading to unstable system or occasional crashes.

Some old waveform I had collected that time, where we introduced extra GPIO 'DISABLE_HS' to handle the this race. disable_hs_ext_gpio.zip Later we optimised further to re-use CS as interrupt in slave, instead of additional 'DISABLE_HS' GPIO.

Some reference waveform where this isse was captured: [Uploading race_condition_pkt_capture_pkt_739e_94s83ms900us9us.sal.zip…]()

mantriyogesh commented 10 months ago

In my opinion, you have yet not reached yet to the point, to take decision of (HS disable on CS assert) or (not). First you have to check why your host clock is not stable (is it capturing settings issue or real clock issue)

mantriyogesh commented 10 months ago

irrespective of ESP-Hosted solution you use.

JAndrassy commented 10 months ago

the SPI frequency is set to 10 MHz for the classic esp32. the logic analyzer sampling frequency was set to 8 MHz so it can't sample the SCK properly.

mantriyogesh commented 10 months ago

yes. Ideally, you should have sampling freq >= actual clock * 4

mantriyogesh commented 10 months ago

If your analyser doesn't support, you can also lower the freq to say 2MHz so that 8MHz sampling could be fine

mantriyogesh commented 10 months ago

any updates?

JAndrassy commented 10 months ago

sorry no. It works good enough for me right now. I used the login analyzer only to see the big picture. I have the content of the SPI messages in debug print output so I don't need the analyzer to decode SPI. And in the debug print of the messages only the dummy message (FF 00 ...) was something I didn't understand.

mantriyogesh commented 10 months ago

Great... It was indeed good interaction. Thanks.