harry-boe commented 8 years ago

I have spent endless nightly hours with the ESP and Logic analyser to figure out how the HW SPI bus is working and why a had so many issues with the current (and other implementations).

Fist of all, all my finding do not apply for the SPISoft implementation. That implementation is basically derived from the TatFS/SD card implementation from ChaN. That's a solid and very efficient peace of code.

The current HW SPI implementation is ok when submitting a few bytes. Basically one 32bit register. But it fails when sending or receiving bigger data packets. As most of the sample projects in Sming (i.e Displays, MUX) just send/rec byte by byte the issues have not yet posed up.

For readability i will split up the my various findings in separate posts. to cover them one by one. And of course i still might be wrong and have not yet tested all my thoughts and theories. However i'm close to a final implementation and like to discuss here how we should define the API and how we will test and go on with the current implementations.

Overview on the findings a made:

It's all about timing: handover from HW to Software control is tricky and a source for random errors
Byte orders/shifting logic and how to let the HW do the Job
Registers & buffer sizes and how to send long data blocks
More control registers and features that might come in handy (and should be considered in the API)
Very own problems when using HW Client Selector on PORT 15

And last but not least the question how to build up a reasonable API. Use the Arduino API, go with Base classes SW HW, enable the full feature set etc.

harry-boe commented 8 years ago

Timing and handover from HW to SW.

The way HW SPI on the ESP works is that you are going the prepare the data, set the Registers and let the HW do the job. This is ok and welcomed. But if you control the client selector in your code you will have some timing issues when sending out more than a byte or two. What happens is that the HW will give the control back to your software and you code might go and disable the Client Selector before the code was actually sent out.

-> here is a sample what might happen (GPIO 2 in this case is the SW CS while HW SS is the signal the ESP SS produces)

To avoid this you have to insert some delays into you code, waiting for the transfer to be complete.

What you need to consider is the amount of delay you need. This delay depends on the the SPI Clock speed and the length of the signal (minus some HW prep time). That raises a number of problems:

If you delay is to short you won't see the correct signal.
if you delay is to long you slow down the system
if you speed up the clock you won't get any speed gain as the delay stays the same
if you slow down the clock you delay might become to short and you end up with 1

The good new is the the SPI implementation has all the information to calculate the right amount of delay. But that's nowhere covered by any API i have seen so far. But i have seen many delay statements in various SPI Slave libs ;(

Here is a sample from my new HW SPI lib build in delay calculation with various signal lengths. As you see you can get very close to the HW SS signal.

Question and decisions to take: Should we build software Delays into a HW SPI lib. ? And to think about: are there better ways to do it with RTOS instead of heaving delays: maybe interrupts ?

PS: here is another example of a missed signal (that one took me hours to find)

harry-boe commented 8 years ago

Byte orders/shifting logic and how to let the HW do the Job

As we all know the ESP has a 32 bit architecture. This includes the SPI register and bus. When submitting bytes, integers longs and byte array you need to take care on correct byte orders.

This is basically what the actual SPI implementations tries to do within the loop below:

void SPIClass::writeData(uint8_t * data, uint8_t numberByte)
{
    uint8_t i = 0;
    uint8_t shift = 0;
    uint32_t *buffer = (uint32_t *)SPI_W0(id);
    for (i = 0; i < numberByte; ++i)
    {
        if (shift >= 32)
        {
            shift = 0;
            buffer += 1;
        }
        *buffer &= ~( 0xFFUL << shift );
        *buffer |= ((uint32_t)data[i]) << shift;
        shift += 8;
    }
}

I have not yet fully understood all the &= ~() != magic in this code but leave that for now.

If you mess the byte orders up you get the wrong data sent out as you can see in the picture below.

Than plan was to send out Hello World

But that's no big deal. We just need to tell the ESP hardware to shift out the registers the way we need them. This will boil the shifting magic down to a single line of code which is copy the data into the registers


 // set byte order CLEAR_PERI_REG_MASK  (Value is 0 so we need to clear the REGISTER
     CLEAR_PERI_REG_MASK(SPI_USER(HSPI), SPI_WR_BYTE_ORDER);

// copy bytes into register
    memcpy ((uint32_t *)SPI_W0(HSPI), &data[bufIndx], bufLenght );

That's much easier and gives me more confidence to be right ;)

And here is the result:

harry-boe commented 8 years ago

Registers & buffer sizes and how to send long data blocks

The HW SPI bus is using a defined memory block to shift out and receive data to/from the bus. The memory are is defined in spi_register.h likely taken from the espresso IOT demo cases. Note the spi_register.h file in Sming is outdated i'm referring here to a newer version from SDK 1.5.

So here is the register definition

//Previous SDKs referred to these following registers as SPI_C0 etc.

#define SPI_W0(i)                           (REG_SPI_BASE(i) +0x40)
#define SPI_W1(i)                           (REG_SPI_BASE(i) +0x44)
#define SPI_W2(i)                           (REG_SPI_BASE(i) +0x48)
#define SPI_W3(i)                           (REG_SPI_BASE(i) +0x4C)
#define SPI_W4(i)                           (REG_SPI_BASE(i) +0x50)
#define SPI_W5(i)                           (REG_SPI_BASE(i) +0x54)
#define SPI_W6(i)                           (REG_SPI_BASE(i) +0x58)
#define SPI_W7(i)                           (REG_SPI_BASE(i) +0x5C)
#define SPI_W8(i)                           (REG_SPI_BASE(i) +0x60)
#define SPI_W9(i)                           (REG_SPI_BASE(i) +0x64)
#define SPI_W10(i)                          (REG_SPI_BASE(i) +0x68)
#define SPI_W11(i)                          (REG_SPI_BASE(i) +0x6C)
#define SPI_W12(i)                          (REG_SPI_BASE(i) +0x70)
#define SPI_W13(i)                          (REG_SPI_BASE(i) +0x74)
#define SPI_W14(i)                          (REG_SPI_BASE(i) +0x78)
#define SPI_W15(i)                          (REG_SPI_BASE(i) +0x7C)

 // +0x80 to +0xBC could be SPI_W16 through SPI_W31?

So we have 16 ? or do we have 32 registers ?

Well lets check how we set up the block length for the submission:

Here is the code snippet:

// calculate bit to send
        dout_bits = bufLenght * 8 -1;
// Setup Bit-lengths for sending
WRITE_PERI_REG(SPI_USER1(HSPI), ((dout_bits)&SPI_USR_MOSI_BITLEN)<<SPI_USR_MOSI_BITLEN_S );

First step is to to calculate the number of bits to send starting from 0. That's the bufLength * 8 bits. As we start from index 0 and not 1 we need to take one bit off.

Next is to tell the ESP the number of bits to send. This info goes into the Register SPI_USER1(HSPI). That register holds information about bytes to sen and bytes to receive. Therefore you have only a certain range within that 32 bit register to set your length. To do this you take the Bit-mask SPI_USR_MOSI_BITLEN do a logical and and shift it to the right location.

Here is the structure of the register SPI_USER1

#define SPI_USER1(i)                          (REG_SPI_BASE(i) + 0x20)
#define SPI_USR_ADDR_BITLEN 0x0000003F
#define SPI_USR_ADDR_BITLEN_S 26
#define SPI_USR_MOSI_BITLEN 0x000001FF
#define SPI_USR_MOSI_BITLEN_S 17
#define SPI_USR_MISO_BITLEN 0x000001FF
#define SPI_USR_MISO_BITLEN_S 8
#define SPI_USR_DUMMY_CYCLELEN 0x000000FF
#define SPI_USR_DUMMY_CYCLELEN_S 0

So out of that definition we know that the max amount of data the ESP can send in one HW transaction is MISO/MISO max bit length -> BITLEN 0x000001FF means 9 bits. That gives us a range from 0-512 bits. This sums up to 64 Bytes or the 16 Registers we know from the SPI_W0 definition.

The SPI_WO register is used for sending and receiving data. If that's done in the one HW transaction than it should be guaranteed that the buffer is sent out before it's overwritten with received data (have not tested that yet)

I found somewhere a hint that it's possible to split up the Register into a TX and RX address. Maybe that when does magic SPI_W16 through SPI_W31 come into place.

So and finally looking into the SPI cod we have in Sming:

void SPIClass::transfer(uint8_t * data, uint8_t count)

So we have no option to set individual buffer length for RX nor TX. The method does no checks for the buffer length. Assuming the Shifting/Byte order magic works correct than it will fail with with counts > 64 bytes !

I correct implementation should loop in blocks of 64 bytes or send a error message or let the user build a external loop.

NOTE: For whatever reason the ESP stops clocking out after 32 Bytes for the MISO line. Might be a Bug or even an error on network Layer 8 (me).

harry-boe commented 8 years ago

More control registers and features that might come in handy (and should be considered in the API)

Looking at the spi_register.h file there are tons of registers to control the data flow. The excellent blog from metalphreak provides some good information on some of the registers.

Many of those registers come in very handy with some devices. Most notably the command, address and dummy registers. This allows you to keep you code clean. So you can send in command sequences and data without the need to copy or even memcpy command and data together before you setup a transaction.

Here is a sample of a sequence from the ArduCAM spec.

Another import setting are the dummy bits. They are used to deal with latency of the slave device. Typical case: the master send out a data request command sequence and starts sending clock signals to receive data. The client MUST immediate send response information (without having time to read and process the command). To deal with that the client will send some dummy bytes on the start of DATA buffer while processing the command. See the sequence below.

The ESP can deal with that. So you can tell the bus to ignore the dummy bits and start moving data into the register from bit xx on.

As for the other control registers. I did not found any reasonable sour of documentation what they do. Maybe there is even e register that will reliable tell us when a transmission is complete. This could solve the delay problem. So whoever has time and a logic analyser can help out here !

Very own problems when using HW Client Selector on PORT 15

Ok the last thing to think about is the HW controlled SS or CS signal. I have not found any way to get more control on this signal. For devices that require the CS line to be low until all data is sent or received that is a problem. Once you sent/recv the first 64 bytes the signal will go high and might set the client into another stage where you cannot continue to read. Again check the timing diagram above. This is the sequence you will use when you want to get a image or video stream. Obviously that will exceed 64 bytes. As a result you cannot use HW CS/SS for this type of device. Instead you need to use CS signal on a different PORT than 15 that you can control within your code.

harry-boe commented 8 years ago

API Questions

I now have a functional complete (or more complete) HW SPI implementation ready that covers most of the topics discussed above.

Before i put more effort into it to prepare a PR i like to get your opinions an how to define the API for that lib.

There are several option.

Keep the API close to the Arduino SPI api's. That will ease porting libs from Arduino. Additional methods can be added where needed
Try to stay close to the current SPI API to avoid code breaks with current samples and libs. To be honest there are not that may samples that really make use of the HW SPI implementation.
Define a base class for SPISoft and HW SPI. That will allow to keep the code clean and allow for switching between SW and HW SPI without a lot of changes.
Come up with something completely different adopted to the real control needs we have.

THX for your votes

alon24 commented 8 years ago

I say a mix of 1 and 3,arduino compatible is very nice to have, and a base class would be great

On Fri, Feb 26, 2016, 19:24 Harry Böttcher notifications@github.com wrote:

API Questions

I now have a functional complete (or more complete) HW SPI implementation ready that covers most of the topics discussed above.

Before i put more effort into it to prepare a PR i like to get your opinions an how to define the API for that lib.

There are several option.

Keep the API close to the Arduino SPI api's. That will ease porting libs from Arduino. Additional methods can be added where needed

Try to stay close to the current SPI API to avoid code breaks with current samples and libs. To be honest there are not that may samples that really make use of the HW SPI implementation.

Define a base class for SPISoft and HW SPI. That will allow to keep the code clean and allow for switching between SW and HW SPI without a lot of changes.

Come up with something completely different adopted to the real control needs we have.

THX for your votes

— Reply to this email directly or view it on GitHub https://github.com/SmingHub/Sming/issues/618#issuecomment-189380536.

avr39-ripe commented 8 years ago

I stay for 1 and indeed 3! You work analyzing spi is awesom!

ADiea commented 8 years ago

@harry-boe thanks for such a detailed SPI analysis. As for the questions i also vote 1 & 3

hreintke commented 8 years ago

@harry-boe : I agree with the comments above. The base class can then also be used when difference occur when adding support for esp32.

harry-boe commented 8 years ago

OK, I will go that way ->

Base class that more or less defines the methods we see in the Arduino API's And obviously adoption the existing code of SPISoft SPI and depending Lib's/examples.

robotiko commented 8 years ago

I'm late to the party, @harry-boe impressive and great work. I also support previous comments: 1) for arduino compatibility (to simplify lib migrations ) and 3) for "Future proof" as @hreintke mentions , esp32 is right there. I do fully agree that 2) is quite irrelevant compared to 1 or 3.

harry-boe commented 8 years ago

I have now all sorted and the ArduCAM working as a proof of concept. The last error i was hunting was a uint8 for the transfer buffer size -> imagine what memcpy does with buffers longer than 256 bytes !!

I will go for the Arduino SPI api resp. the extended Arduino DUO SPI API

It has a really nice features using SPISettings. This allows you to have different setting for different devices including speed, byte order etc.

SPISettings mySettting(speedMaximum, dataOrder, dataMode) The sample code here explains best what it does:

#include <SPI.h>

// using two incompatible SPI devices, A and B. Incompatible means that they need different SPI_MODE
const int slaveAPin = 20;
const int slaveBPin = 21;

// set up the speed, data order and data mode
SPISettings settingsA(2000000, MSBFIRST, SPI_MODE1); 
SPISettings settingsB(16000000, LSBFIRST, SPI_MODE3); 

void setup() {
  // set the Slave Select Pins as outputs:
  pinMode (slaveAPin, OUTPUT);
  pinMode (slaveBPin, OUTPUT);
  // initialise SPI:
  SPI.begin(); 
}

uint8_t stat, val1, val2, result;

void loop() {
  // read three bytes from device A
  SPI.beginTransaction(settingsA);
  digitalWrite (slaveAPin, LOW);
  // reading only, so data sent does not matter
  stat = SPI.transfer(0);
  val1 = SPI.transfer(0);
  val2 = SPI.transfer(0);
  digitalWrite (slaveAPin, HIGH);
  SPI.endTransaction();
  // if stat is 1 or 2, send val1 or val2 else zero
  if (stat == 1) { 
   result = val1;
  } else if (stat == 2) { 
   result = val2;
  } else {
   result = 0;
  }
  // send result to device B
  SPI.beginTransaction(settingsB);
  digitalWrite (slaveBPin, LOW);
  SPI.transfer(result);
  digitalWrite (slaveBPin, HIGH);
  SPI.endTransaction();
}

I also like the concept how they deal with transactions: That's really convenient. On top of that you can of course still use SPI.beginTransaction()

void loop(){
//transfer 0x0F to the device on pin 10, keep the chip selected
SPI.transfer(10, 0xF0, SPI_CONTINUE);
//transfer 0x00 to the device on pin 10, keep the chip selected
SPI.transfer(10, 0×00, SPI_CONTINUE);
//transfer 0x00 to the device on pin 10, store byte received in response1, keep the chip selected
byte response1 = SPI.transfer(10, 0×00, SPI_CONTINUE);
//transfer 0x00 to the device on pin 10, store byte received in response2, deselect the chip
byte response2 = SPI.transfer(10, 0×00);
}

alon24 commented 8 years ago

so basically aruinocam works, can you also share that, and which cam to buy?

On Wed, Mar 2, 2016 at 12:51 AM, Harry Böttcher notifications@github.com wrote:

I have now all sorted and the ArduCAM working as a proof of concept. The last error i was hunting was a uint8 for the transfer buffer size -> imagine what memcpy does with buffers longer than 256 bytes !!

I will go for the Arduino SPI api resp. the extended Arduino DUO SPI API https://www.arduino.cc/en/Reference/DueExtendedSPI

It has a really nice features using SPISettings. This allows you to have different setting for different devices including speed, byte order etc.

SPISettings mySettting(speedMaximum, dataOrder, dataMode)

The sample code here explains best what it does:

include

// using two incompatible SPI devices, A and B. Incompatible means that they need different SPI_MODE const int slaveAPin = 20; const int slaveBPin = 21;

// set up the speed, data order and data mode SPISettings settingsA(2000000, MSBFIRST, SPI_MODE1); SPISettings settingsB(16000000, LSBFIRST, SPI_MODE3);

void setup() { // set the Slave Select Pins as outputs: pinMode (slaveAPin, OUTPUT); pinMode (slaveBPin, OUTPUT); // initialise SPI: SPI.begin(); }

uint8_t stat, val1, val2, result;

void loop() { // read three bytes from device A SPI.beginTransaction(settingsA); digitalWrite (slaveAPin, LOW); // reading only, so data sent does not matter stat = SPI.transfer(0); val1 = SPI.transfer(0); val2 = SPI.transfer(0); digitalWrite (slaveAPin, HIGH); SPI.endTransaction(); // if stat is 1 or 2, send val1 or val2 else zero if (stat == 1) { result = val1; } else if (stat == 2) { result = val2; } else { result = 0; } // send result to device B SPI.beginTransaction(settingsB); digitalWrite (slaveBPin, LOW); SPI.transfer(result); digitalWrite (slaveBPin, HIGH); SPI.endTransaction(); }

I also like the concept how they deal with transactions: That's really convenient. On top of that you can of course still use SPI.beginTransaction()

void loop(){ //transfer 0x0F to the device on pin 10, keep the chip selected SPI.transfer(10, 0xF0, SPI_CONTINUE); //transfer 0x00 to the device on pin 10, keep the chip selected SPI.transfer(10, 0×00, SPI_CONTINUE); //transfer 0x00 to the device on pin 10, store byte received in response1, keep the chip selected byte response1 = SPI.transfer(10, 0×00, SPI_CONTINUE); //transfer 0x00 to the device on pin 10, store byte received in response2, deselect the chip byte response2 = SPI.transfer(10, 0×00); }

— Reply to this email directly or view it on GitHub https://github.com/SmingHub/Sming/issues/618#issuecomment-190947275.

harry-boe commented 8 years ago

i create a PR for the ArduCAM once the code is polished and fully tested. The ArduCAM we work with is the 2MP model OV2640

arducam-mini-module-camera-shield-with-2-mp-ov2640-sensor-for-uno-mega2560-board jpg_220x220

hreintke commented 8 years ago

@harry-boe : Sorry to use this thread for a generic question but have been busy lately and did not follow all conversation on your PR's in detail.

I have the impression that a number of them are inter-related by the SPI discussion.

Can you give me an update on status, so that I can clear up/merge outstanding PR's ?

harry-boe commented 8 years ago

I have the API and BASE classes now ready. I also ported the SPISoft to the new API and tested that successfully with SDCard and ST7735 display. I found a way around the nasty inline void setMOSI(uint8_t val){digitalWrite(mMOSI, val);} method which was required as you can't control the MOSI line directly when using HW SPI.

The changes have quite some impact for several LIBS and Samples. It does compile but it needs testing!

I have not yet done the PR as the implementation for the HW SPI is still incomplete but if someone finds time to test other devices using SW SPI please get the code from my repo

harry-boe/Sming new_SPI_impl

alon24 commented 8 years ago

Will this be an ok camera too (half price) http://www.aliexpress.com/item/V2640-Camera-Module-2-Megapixel-UXGA-1622X1200-Camera-Board-JPEG-output-development-Board-Kit/32328698592.html?spm=2114.01010208.3.1.Foz7Qx&ws_ab_test=searchweb201556_6,searchweb201644_1_505_506_503_504_10032_10020_502_10001_10002_10017_10005_10006_10003_10021_10004_10022_10018_10019,searchweb201560_2,searchweb1451318400_-1,searchweb1451318411_-1&btsid=2d0508df-85ec-49f3-90e3-73dce04963f4

On Wed, Mar 2, 2016 at 11:02 AM, Harry Böttcher notifications@github.com wrote:

i create a PR for the ArduCAM once the code is polished and fully tested. The ArduCAM we work with is the 2MP model OV2640 http://www.aliexpress.com/item/ArduCAM-M-2MP-Camera-Shield-OV2640-2MP-SPI-Camera-module/32307177889.html?spm=2114.01010208.3.9.ZXWipQ&ws_ab_test=searchweb201556_6,searchweb201644_2_505_506_503_504_10020_502_10001_10002_10017_10005_10006_10021_10003_10004_10022_10018_10019,searchweb201560_1,searchweb1451318400_-1,searchweb1451318411_-1&btsid=b6cb97e1-5122-4028-b98d-970fa47e2c58

[image: arducam-mini-module-camera-shield-with-2-mp-ov2640-sensor-for-uno-mega2560-board jpg_220x220] https://cloud.githubusercontent.com/assets/1899323/13455442/c1badd62-e05d-11e5-877d-b3b654cdd2aa.jpg

— Reply to this email directly or view it on GitHub https://github.com/SmingHub/Sming/issues/618#issuecomment-191141344.

harry-boe commented 8 years ago

@alon24 This looks like a bare naked CCD Sensor. No flash whatsoever on the dev board. So i don't think it will support the Arducam Flash FIFO buffers. I can't tell without giving it a try but i don't think it will work! There a no Datasheets ether so i can't verify.

harry-boe commented 8 years ago

PR is out so please test what you can

alonewolfx2 commented 8 years ago

I am wondering one thing. esp8266 modules about $3, arducam $25, and total price about $30. you can buy 2MP network cam with many features and enclosure for this price. why we need this? btw i am saying this just out of curiosity. otherwise hwspi and softspi class PR is very good.

harry-boe commented 8 years ago

Fair point.

Question as the ESP does not sum up. Why would anyone buy a ArduCAM at all ?

For me it's about heaving full control on the cam and being able to have my own network protocol. We like to use mqtt in our case and maybe add some security ;)

alonewolfx2 commented 8 years ago

Fair enough. How about streaming? can esp and arducam handle 10-15 fps jpeg streaming?

harry-boe commented 8 years ago

Well i don't know yet but it should work. I cranked the SPI bus up to 270 kB/sec with 40 MHz clock speed and the SD Card. The cam can go to 80 MHz clock speed. The handling using a IDataSourceStream called from the TCP stack does not add a lot of processing logic. So i would say if i get the ArduCAM FIFO burst mode (SPI Block read) to work properly you should end up with decent fps and image sizes.

alonewolfx2 commented 8 years ago

so can new hwspi run on 80mhz ?

harry-boe commented 8 years ago

Theoretically YES

If you set the SPI Clock speed to 80'000'000 using the SPISettings it will set the clock divider to 0 and this will tell the ESP to use the System clock as SPI clock. Note: Clock definitions between SysClock and half sys Clock do not have a valid clock divider and therefore the speed will be set to half sys clock. All that magic happens in SPIClass::setFrequency(int freq)

However i did not found a test device that can go that fast nor is my Logic analyser capable of capturing that fast. My SD cards refused to work over 40MHz SPI clock .. but maybe another SD card can. (BTW: SD cards have a SPI slave controller built in that defines the max SPI speed they can go - you can actually connect a SD card directly to the GPIO pins if you really want to)

harry-boe commented 8 years ago

all the findings are implemented and merged with pr #642

SmingHub / Sming

Hardware SPI: Issues i discovered and potential solutions (API Discussion) #618

Timing and handover from HW to SW.

Byte orders/shifting logic and how to let the HW do the Job

Registers & buffer sizes and how to send long data blocks

More control registers and features that might come in handy (and should be considered in the API)

Very own problems when using HW Client Selector on PORT 15

API Questions

include