RudolphRiedel / FT800-FT813

Multi-Platform C code Library for EVE graphics controllers from FTDI / Bridgetek (FT810, FT811, FT812, FT813, BT815, BT816, BT817, BT818)
MIT License
121 stars 56 forks source link

IOT5 from Riverdi support? #14

Open robouden opened 3 years ago

robouden commented 3 years ago

Just wondering if the code supports also a IOT5 board with ESP32. I like to use it for client project with VSCodium and PlatformIO.

Regards, Rob Oudendijk

RudolphRiedel commented 3 years ago

I am confident that it can be done. However I do not have native code for the ESP32 yet, right out of the box it should work with Arduino. It's not like this is utterly complicated, I only have not done it so far.

But, unfortunately I found the documentation from Riverdi to be rather incomplete. There is no user-manual or datasheet available for download. Only a sort-of online "datasheet" that has no schematic and does not specify which pins from the ESP32 are used for what function. Well, okay, it sort of does, but only with Zerynths pin names and not with the GPIOxx designators from Espressif. I found out that CS on pin D4 and PD is on pin D33. But I could not find anywhere which GPIOs these are assigned to. Oh yes, SCLK, MISO and MOSI are connected to SPI0.

If you can figure out the pins and set these in EVE_target.h for the Arduino target and also select EVE_RiTFT50 in EVE_config.h it should work already.

What framework are you planning to use? Arduino or the "Espressif IoT Development Framework"?

RudolphRiedel commented 3 years ago

Anyways, I just ordered one of these: grafik

Because why the heck not. :-)

RudolphRiedel commented 3 years ago

I got it running on the ESP32 and it really only was a matter of finding a Board definition in PlatformIO that matches this board and setting up the correct pins for CS and PD: grafik

There is something wrong however. Either it runs a lot slower than on an Arduino UNO, or micros() is returning bogus values.

The UNO shows Time1: 516µs Time2: 48µs The ESP32 shows Time1: 1476µs Time2: 87µs

And that is after I sped up the SPI, without modification and running the SPI at 8MHz like the UNO it is 1576/92 µs. At 8MHz the overhead for transferring the ~220 bytes necessary for the display list should be less than 250µs.

So the UNO needs 266µs for putting the list together while the ESP32 needs 1326µs? That is a factor of five for exactly the same calculations. But the ESP should be at least ten times faster, not a lot slower.

I updated the "EVE_Test_Arduino_PlatformIO" example with settings for Metro-M4 and ESP32. And I switched the display to an EVE3-50G which is the closest I have to an IOT5.

RudolphRiedel commented 3 years ago

Just to make sure it is not PlatformIO I installed Arduino-ESP32 in the ArduinoIDE and it even runs a little slower with Time1: 1580 Time2: 95

Unfortunately I left the Logic Analyzer in the office.

RudolphRiedel commented 3 years ago

Hmm, as it turned out, the SPI driver for arduino-esp32 is garbage, at least for a direct switch from an UNO to an ESP32. As the ESP32 is running a RTOS underneath the SPI driver is thread-safe and this costs an awfull lot of time when transferring single bytes over SPI. There are even Interrupts() involved.

In my opinion a SPI.transfer(data); should not use anything else but direct hardware access with polling and return in very little over 1µs at 8MHz, it should not involve anything like SPI_MUTEX_LOCK() / SPI_MUTEX_UNLOCK() calls, Interrupts or copying the data to anywhere else than the SPI data register. It should be complete thread un-aware and un-safe as a default.

As it is, this is not really Arduino compatible when transferring a couple of single bytes is taking a lot longer than on the UNO. Yes of course, for those needing it there should be a way to make it thread-safe as well.

Anyways, as the Arduino SPI for ESP32 is by default awfully slow, special measures are required to make it useable.

The first solution I came up with is quite simple: `

static inline void spi_transmit_32(uint32_t data) {

if defined (ESP32)

uint8_t buffer[4];
buffer[0] = (uint8_t)(data);
buffer[1] = (uint8_t)(data >> 8);
buffer[2] = (uint8_t)(data >> 16);
buffer[3] = (uint8_t)(data >> 24);
SPI.transfer(buffer, 4);

else

spi_transmit((uint8_t)(data));
spi_transmit((uint8_t)(data >> 8));
spi_transmit((uint8_t)(data >> 16));
spi_transmit((uint8_t)(data >> 24));

endif

}

`

This brings down "time 1" from 1562µs to 662µs - which is better but still way slower than the 516µs an UNO needs. Since that kicked out Arduino compatibility already we can go one step further and replace SPI.transfer() with SPI.writeBytes(). This brings down "time 1" to 613µs - still bad.

Okay, now, my library already has measures in place to support DMA, lets use that to just build a bigger buffer and transfer this buffer without DMA: ` static inline void spi_transmit_burst(uint32_t data) {

if defined (EVE_DMA)

    EVE_dma_buffer[EVE_dma_buffer_index++] = data;
#else
    spi_transmit_32(data);
#endif

}

if defined (EVE_DMA)

static inline void EVE_init_dma(void) { }

static inline void EVE_start_dma_transfer(void) { uint8_t buffer[3]; buffer[0] = (uint8_t)(EVE_dma_buffer[0] >> 8) ; buffer[1] = (uint8_t)(EVE_dma_buffer[0] >> 16); buffer[2] = (uint8_t)(EVE_dma_buffer[0] >> 24); EVE_cs_set(); SPI.writeBytes(buffer, 3); SPI.writeBytes((uint8_t ) &EVE_dma_buffer[1], (EVE_dma_buffer_index-1)4); EVE_cs_clear(); }

endif

`

This brings down "time 1" to 304µs - better than the UNO but a joke when considerung the 15 times higher clock.

To recap, we are transferring 220 bytes over SPI at 8MHz clock. The Arduino UNO needs 516µs for this with the overhead for pure SPI transfer probably at <250µs. The ESP32 needs 1576µs for executing exactly the same code as the Arduino UNO. By just changing the way the data is transferred over SPI without changing anything else I was able to reduce this time to 662µs, then 613µs and finally 304µs. So at the very minimum there is an overhead of 1272µs for transferring 220 bytes over SPI - wow.

So far I am not impressed by the ESP32 and I really do have to wonder why the Arduino core is so bad at transferring bytes over SPI in an Arduino UNO compatible way.

robouden commented 3 years ago

Rudolph,

Really appreciate you fast responses. Sorry for my late reply. Had to travel and setup on other place.

I updated my code, setup for EV3-50G in EVE_config.h. Setup environment on VSCodium and PlatformIO for ESP32. Complied the code. Flashed the IOT5, but nothing on the display.

What suggestion do you have?

regards, rob

robouden commented 3 years ago

Rudolph,

I checked to code from Zerynth for the IOT5. The code with Zerynth runs fine, but very slow startup I think it has to load the Zerynth VM. Some info from the code:

_bt81x.init(SPI0, D4, D33, D34) bt81x.touch_loop(((-1, widget_choice_cbk), )) # listen to touch events and make widget_choicecbk process them

and :

from riverdi.displays.bt81x import ctp50 from bridgetek.bt81x import bt81x

I hope this helps.

regards, rob

robouden commented 3 years ago

Rudolph,

More bt81x.init info at https://docs.zerynth.com/latest/reference/libs/bridgetek/bt81x/docs/bt81x/.

_init(spi,cs,pd,int,dc=None,spi_speed=3000000)

Parameters:

spi – spi driver (SPI0, SPI1, ...) cs – chip select pin pd – pd pin int – interrupt pin dc – display configuration as a DisplayConf() instance spi_speed – spi speed in Hertz Initializes the chip.

When dc parameter is not specified displayconf global variable is used.

regards, rob

RudolphRiedel commented 3 years ago

I received the notifications from Github at work but I do not have my Github credentials at work...

As I wrote, first of all you need to identify to which pins CS and PD are wired and how these are used from PlatformIO. The Zerynth software has CS on D4 and PD on D33. There are GPIO4 and GPIO33 pins on the ESP32-WROOM module. And it is possible that these are used as "4" and "33" from Arduino. But this is just an idea with no documentation to back this up, you need to verify on your end that "4" does indeed control the CS line of the BT815 and "33" does control the "PD" line on the BT815.

When you have the correct pin numbers, open EVE_target.h, scroll down to line 856:

if defined (ESP32)

#define EVE_CS  13
#define EVE_PDN 12

Now change that to the correct numbers, maybe it is like this:

if defined (ESP32)

#define EVE_CS  4
#define EVE_PDN 33

Then you need to open EVE_config.h and change line 124:

define EVE_EVE3_50G -> #define EVE_RiTFT50

I am using the EVE3_50G as I do not have a RiTFT50 to test with and the EVE3_50G is quite similar.

RudolphRiedel commented 3 years ago

To put some more facts on the table to show that I am not just ranting, here are a few shots with the logic-analyzer: grafik

This is single-byte SPI transfers, there really is that much of a gap between bytes meaning that the driver spends more time on playing with itself than putting out bytes on the SPI.

grafik

This is with SPI.writeBytes(buffer, 4); This is only faster as there are less gaps.

grafik

This is with 13MHz instead of 8MHz, it only is faster as the bits between the gaps take less time.

grafik

This is what transferring a larger buffer looks like. There is a short pause every 64 bytes. The longer pause towards the start is there as I read somewhere that the transfers have to be word aligned. And the first transfer is three bytes for the address, the rest is in 32 bits.

grafik

And for the last image, this is what it looks like using an Adafruit Metro M4 as Arduino with single byte writes. The display update takes 302µs. When I go bare-metal on the SAME51with my own SPI driver I get 284µs without using DMA. All of course with the same 8MHz SPI clock.

robouden commented 3 years ago

Rudolph,

Hereby the part of the schematic that we can use from Riverdi. image001 Regards, rob

robouden commented 3 years ago

Rudolph,

I tried what you suggested, but nothing on the display yet. I found more information from Zerynth at https://docs.zerynth.com/latest/reference/libs/bridgetek/bt81x/docs/bt81x/

I hope this can help you find out what is needed to get the display to work without using Zerynth (startup takes 5 seconds before anything is display). This is not acceptable for me.

Regards, rob

robouden commented 3 years ago

Rudolph,

Attached is the bt81x library that Zerynth uses.

Regards, rob bt81x.zip

dstulken commented 3 years ago

It's a difficult architectural decision for sure. The ESP32 platform heavily promotes multi-core multi-threaded usage, so having thread-safe I/O by default is an understandable choice. One could argue that user code should handle locking and unlocking of resources on an as-needed basis, but then we invite priority inversion, blocked tasks, etc... the FreeRTOS scheduler becomes impacted, the watchdog fires (legitimately) causing the device to reset, etc, etc.

Making every single-byte transfer thread-safe clearly isn't elegant, but I can see why the developers would elect to have some guardrails in place for the default settings. Most people porting Arduino apps up to the multi-core ESP32 aren't going to have an extensive background in process/thread management or the associated pitfalls, and the "My ESP32 keeps rebooting! What a piece of junk!" type of forum posts are difficult for a (mostly) volunteer community to support, and it would cause unnecessary churn for hardware vendors exchanging the "faulty" modules for customers...

One thing I did see in a few search results - Have you tried testing with "CONFIG_DISABLE_HAL_LOCKS" enabled?
I also saw some reports that software SPI was faster than hardware SPI on an ESP32... although many of the posts were several years old, and things have changed very rapidly with the Arduino environment on these modules.

Please keep us posted if your ESP32-specific speedups described above make it into the mainline code - I have several ESP32/EVE projects in the works, so your efforts to diagnose things like this are certainly appreciated!

davidjade commented 3 years ago

Here is an example of how to drive an FT813 with an ESP32 using native (not Arduino) SPI DMA. This is a fork of this FT800-FT813 project that I did last spring to add better performing SPI DMA and to also improve and simplify the buffering of SPI data when using DMA. With this code I get these times on a EPS32 using an FT813 running the demo at a 30Mhz SPI clock rate.

Time1: 152us Time2: 28us

Example code is here: https://github.com/davidjade/FT81x-ESP32

Note that the above example code does have one bottleneck and that is that it waits for all previous DMA requests to complete before issuing new ones. This bottle next is included in the above times. There is another way to handle that and that is by using queued DMA mode to maintain a bunch of simultaneous in-flight DMA requests. This requires managing multiple transmit buffers though and this example does not do that. However, unless you are sending very large bitmaps split over many chunks very quickly this is unlikely to be a serious bottleneck as most small DMA requests would likely complete before more where issued anyway.

However, I have another example of using DMA queued mode with the FT813 that also supports both Dual and Quad mode SPI with DMA that is stable at 30Mhz on an ESP32. It supports up to 50 (or more) in-flight DMA requests simultaneously. When using the DMA queuing mode I can easily saturate the SPI bus at 30Mhz in Single, Dual, or Quad SPI modes as long as I can feed it data fast enough. This queued DMA code is in the driver that I wrote to enable running the LVGL graphics library on the FT813, in the lvgl_esp32_drivers project. With that LVGL code I can do complete full screen bitmap updates at many frames per second on an FT813 (where each full frame is sending roughly 768k).

You may argue that this second example is not the intended purpose of using the FT813 because LVGL treats it like a big dumb bitmap display, but whatever - I wanted to use a more powerful graphics library on the displays I already had. I'm not interested in arguing about that. I am merely showing two examples of the proper way to use SPI on an ESP32 is by using native DMA and that it can be very fast.

The LVG project is here: https://github.com/lvgl/lv_port_esp32

RudolphRiedel commented 3 years ago

Getting mails over the day with no access to Github is not nice. :-) Anyways, looks like I need to explain some more what I am up to, and I will. But first to address the issue that the example is not running on the IoT5.

Hereby the part of the schematic that we can use from Riverdi.

That is exactly what I was looking for and I could not find it. The schematic shows what the issue is, they changed the pins for the SPI itself. I just updated the EVE_Test_Arduino_PlatformIO example, just I know, this is not the right way to do but I went for the quick and dirty solution. :-)

src.ino now has this: `

if defined (ESP32)

SPI.begin(EVE_SCK, EVE_MISO, EVE_MOSI, EVE_CS);
SPI.setClockDivider(SPI_CLOCK_DIV2); /* speed up SPI */

// SPI.setClockDivider(0x00081001); / speed up SPI /

else

SPI.begin(); /* sets up the SPI to run in Mode 0 and 1 MHz */
SPI.setClockDivider(SPI_CLOCK_DIV2); /* speed up SPI */

endif

`

So the SPI-pins are configured.

Next ist EVE_target.h line 856ff: `

if defined (ESP32)

`

define EVE_CS 13

#define EVE_PDN 12
#define EVE_SCK 18
#define EVE_MISO    19
#define EVE_MOSI    23

`

This is what is working with my UNO style ESP32 board, change it to this for the IoT5: `

define EVE_CS 4

#define EVE_PDN 33
#define EVE_SCK 14
#define EVE_MISO    2
#define EVE_MOSI    15

`

And finally in EVE_config.h the EVE_RiTFT50 needs to be selected.

If you build this thru the PlatformIO menu from the sidebar for the ESP32 and upload it, it should show my basic example with "Time1:" at 305µs. The simple workaround to use the DMA buffer to transfer a larger block over the SPI is in place, mostly in EVE_target.h.

RudolphRiedel commented 3 years ago

It's a difficult architectural decision for sure. The ESP32 platform heavily promotes multi-core multi-threaded usage, so having thread-safe I/O by default is an understandable choice. One could argue that user code should handle locking and unlocking of resources on an as-needed basis, but then we invite priority inversion, blocked tasks, etc... the FreeRTOS scheduler becomes impacted, the watchdog fires (legitimately) causing the device to reset, etc, etc.

I sort of agree overall but this is where I need to explain some more. My angle here is not what is possible but this system claims to be an Arduino and it indeed compiles the exact some code than what is needed for the UNO, nothing special required. And Arduino is not multi-core. Heck, the typical application for the ESP32 is not multi-core, multi-thread either, not on the surface. Usually there is just this second controller around that deals with WLAN, like a function unit. Anyways, I agree that all of this multi-threading, rtos stuff is not a bad thing to have per se.

Making every single-byte transfer thread-safe clearly isn't elegant, but I can see why the developers would elect to have some guardrails in place for the default settings. Most people porting Arduino apps up to the multi-core ESP32 aren't going to have an extensive background in process/thread management or the associated pitfalls, and the "My ESP32 keeps rebooting! What a piece of junk!" type of forum posts are difficult for a (mostly) volunteer community to support, and it would cause unnecessary churn for hardware vendors exchanging the "faulty" modules for customers...

Well now, the issue really is that the systems claims to behave like a very fast Arduino. But in reality it is, at least with SPI, a very slow Arduino untill you put workarounds in. This really is all I am saying, it does not behave like an Arduino. And part of the issue is that the SPI class merely is a wrapper for Espressifs driver which by itself is not really efficent. Yes of course, I do understand why, this is easier to implement and to maintain. But this way you inherit functions like this: ` void IRAM_ATTR spiWriteByteNL(spi_t * spi, uint8_t data) { if(!spi) { return; } spi->dev->mosi_dlen.usr_mosi_dbitlen = 7; spi->dev->miso_dlen.usr_miso_dbitlen = 0; spi->dev->data_buf[0] = data; spi->dev->cmd.usr = 1; while(spi->dev->cmd.usr); }

void spiWriteByte(spi_t * spi, uint8_t data) { if(!spi) { return; } SPI_MUTEX_LOCK(); spi->dev->mosi_dlen.usr_mosi_dbitlen = 7; spi->dev->miso_dlen.usr_miso_dbitlen = 0; spi->dev->data_buf[0] = data; spi->dev->cmd.usr = 1; while(spi->dev->cmd.usr); SPI_MUTEX_UNLOCK(); } `

So with every byte, word, long or buffer access two additional registers are configured. Plus the check for the zero-pointer. This is not much but it also costs time over and over again.

Yes of course, nothing keeps me from throwing this all away and implementing it myself. Except that I do not even intend to use the ESP32 for anything and only have one for a couple of days now. :-)

One thing I did see in a few search results - Have you tried testing with "CONFIG_DISABLE_HAL_LOCKS" enabled?

I have not, maybe this in an option. This is however nothing one would use on an UNO, this only works for the ESP32, nowhere else, this kind of undermines the idea what Arduino is about. Anyways, I already went into the deep end with using SPI functions that are not portable. :-) Ultimately using DMA would be the way to go.

Here is an example of how to drive an FT813 with an ESP32 using native (not Arduino) SPI DMA. This is a fork of this FT800-FT813 project that I did last spring to add better performing SPI DMA and to also improve and simplify the buffering of SPI data when using DMA. With this code I get these times on a EPS32 using an FT813 running the demo at a 30Mhz SPI clock rate.

Time1: 152us Time2: 28us

And as I wrote before, this is really slow for using DMA. My 120MHz ATSAME51 is running the same code right now with Time1: 14µs and Time2: 8µs. The SPI clock does not even matter since it is DMA but Time2: would go up a little when I would change it from the 15MHz I am using. The ESP32 should be able of going maybe a little slower but not worse than 30µs.

Note that the above example code does have one bottleneck and that is that it waits for all previous DMA requests to complete before issuing new ones.

That is not really an issue since updating the screen every 200µs would mean to issue 5000 updates per second. You need to throttle that anyays and hence the 20ms I am using for 50 frames per second. With calling tft_touch() first and once every 5ms I have at least 4ms from the last tft_display() to the next tft_touch(). And as tft_touch() is really hardly doing anything I allow it to use blocking transfers.

The one reason I can not use your code is because you completely broke everything else. That is not a critique, that is perfectly fine, whatever suits you. However, since you broke every other architecture in order to implement your idea of using DMA with the ESP32, I just can not use it.

And while I am still interested to learn how a buffer can be transferred over SPI using DMA and a callback hook, I have not found any example yet that explains this and I am pretty sure any implementation has to go around the SPI class since it does not seem to support DMA.

davidjade commented 3 years ago

For what it's worth, the code examples that I pointed out were just to illustrate how the native ESP32 SPI DMA process works. The Arduino SPI classes as everyone has pointed out, are not that great. I fully acknowledge that the current code here could not likely use these native methods as is. However, if you look at the ESP32 SPI example code (use the LVGL example as it is more capable - the files that contain all the SPI API calls are disp_spi.c and disp_spi.h) you will see that the basic process is:

  1. Put the data you want to transmit into a buffer (must be an aligned, DMA capable memory allocation)
  2. Create a SPI transaction (which points to the buffer)
  3. Queue the SPI transaction for a DMA operation
  4. At some point in the future, retrieve the transaction results (still needed even if only sending or you potential leak memory)

You can queue multiple DMA transaction simultaneously if needed (and you should for maximum throughput) but you need to do the bookkeeping to keep track of things and you need a separate data buffer for each. During the whole 4 step process for each transaction that is in-flight, the buffer you used must not be touched until after the DMA transfer has completed. You can use a callback to get notified when that has happened but you need to keep track of which transaction completed if you queued multiple. So a bit more complicated that just queuing data in a fire and forget way.

As for DMA transfer times on the ESP32 in my first example I think the slowness comes down to a few things:

First, don't discount the bottleneck I mentioned in my first example because it means there is a blocking call at the start of every single SPI transfer (again, the LVGL version does not have this limitation since it can queue multiple requests). The code literally spins waiting for the previous transaction to complete since it can only handle one transaction in-flight.

Second, I think it also likely suffers from sending many small requests via DMA even if the bottleneck was removed as there is a bunch of stuff going on to queue DMA transactions. It may just not be that efficient on an ESP32 so the more things can be buffered together, the better to reduce the overall number of transactions when possible. Gets complicated when interleaving reading and writing for sure.

Lastly, the difference between my two examples is striking to say the least. The LVGL version can saturate the SPI bus at any clock speed and in Quad SPI DMA mode can transfer the full resolution of the screen (768k bytes of data - 800x480 at 16bpp) at roughly 15-16 times a second. That's with breaking up the large transfer into 4k chunks, where each is a separate DMA transaction (~190 transactions total). So the ESP32 DMA can be as fast as the SPI clock rate - continuously but it takes effort to fit to what the ESP32 expects. But that's also not really applicable to using the FT8xx as intended with it's many small transfers.

Btw, you're right that there are not a lot of examples on using native queued SPI DMA on the ESP32. I figured most of this out by reading the ESP32 SPI driver source, which is part of the IDF SDK. The LVGL SPI code was started by someone else but I did a lot to improve DMA performance and capabilities.

RudolphRiedel commented 3 years ago

For what it's worth, the code examples that I pointed out were just to illustrate how the native ESP32 SPI DMA process works.

I only had the chance to briefly look at it but I do not see it. How is this doing DMA and as this does not seem to use the Espressif driver, what driver are you using?

The Arduino SPI classes as everyone has pointed out, are not that great.

At the very first glance and at a basic level it is painfully slow, yes. But then it offers quite some features to speed things up, 300µs ist at least better than the UNO does.

  • Put the data you want to transmit into a buffer (must be an aligned, DMA capable memory allocation)

Not only that but from what I read you can only transfer 32 bits which is a total pain with the need to start off with a three byte command.

  • Create a SPI transaction (which points to the buffer)
  • Queue the SPI transaction for a DMA operation

I am not seeing that part, I see that SPI transaction but not how it turns into DMA.

  • At some point in the future, retrieve the transaction results (still needed even if only sending or you potential leak memory)

The manual for the driver mentions a callback function, I would need that.

You can queue multiple DMA transaction simultaneously if needed (and you should for maximum throughput) but you need to do the bookkeeping to keep track of things and you need a separate data buffer for each. During the whole 4 step process for each transaction that is in-flight, the buffer you used must not be touched until after the DMA transfer has completed. You can use a callback to get notified when that has happened but you need to keep track of which transaction completed if you queued multiple. So a bit more complicated that just queuing data in a fire and forget way.

And that is where your approach and mine collide. I only use a single buffer to write out all data in one go, there is nothing that even could collide. The three bytes for the command at the start is an issue for the ESP32, that has to be done with single byte transfers which will cost ~25µs in the worst case but after that, a single big chunk of data and the next one after 20ms.

Check how I did it for the sample code. I only changed EVE_target.h and added EVE_target.cpp as a place to declare the variables:

`

if defined (ESP32)

#include "EVE_target.h"
#include "EVE_commands.h"
#if defined (EVE_DMA)
    uint32_t EVE_dma_buffer[1025];
    volatile uint16_t EVE_dma_buffer_index;
    volatile uint8_t EVE_dma_busy = 0;
#endif

endif

`

`

if defined (EVE_DMA)

static inline void EVE_init_dma(void) { }

static inline void EVE_start_dma_transfer(void) { uint8_t buffer[3]; buffer[0] = (uint8_t)(EVE_dma_buffer[0] >> 8) ; buffer[1] = (uint8_t)(EVE_dma_buffer[0] >> 16); buffer[2] = (uint8_t)(EVE_dma_buffer[0] >> 24);

EVE_cs_set();
SPI.writeBytes(buffer, 3);
SPI.writeBytes((uint8_t *) &EVE_dma_buffer[1], (EVE_dma_buffer_index-1)*4);
EVE_cs_clear();

}

endif

static inline void spi_transmit_burst(uint32_t data) {

if defined (EVE_DMA)

    EVE_dma_buffer[EVE_dma_buffer_index++] = data;
#else
    spi_transmit_32(data);
#endif

} `

Everything else already is in place and requires no modification. With that the EVE_start_cmd_burst()/EVE_end_cmd_burst() functions will put everything as fast as possible in the buffer and send it away when done.

It only does not do DMA since I have no idea how. But if that last SPI.writeBytes((uint8_t ) &EVE_dma_buffer[1], (EVE_dma_buffer_index-1)4); would be converted to use DMA and EVE_cs_clear(); is called by a callback function, then the time would go down from 300µs to maybe 80µs, hopefully less. Ah yes, EVE_dma_busy also needs to be set to !0 at the start of the transfer and reset to 0 with the callback function.

davidjade commented 3 years ago

How is this doing DMA and as this does not seem to use the Espressif driver, what driver are you using?

So this code uses the SPI APIs as provided by the ESP32 IDF toolkit (i.e. the native C code SDK for the ESP32). In that toolkit/framework/SDK there is the SPI Master driver. In the ESP32 IDF toolkit, these are called Components, which are just little bits of libraries that can be included into projects. There are many Component drivers for all of the various hardware aspects of the ESP32 that are supplied by Espressis in the IDF toolkit.

My code uses this SPI Master component to talk SPI. It is the lowest SPI API on a ESP32 other than the low-level hardware, which is not exposed directly in the ESP32 framework. So it is a driver over the SPI hardware that can be used directly. The Arduino SPI classes are another SPI driver but they are built on top of the SPI Master - so two levels on top of the hardware now. It's the Arduino layer that has locks, less flexibility, etc...

Not only that but from what I read you can only transfer 32 bits which is a total pain with the need to start off with a three byte command.

It depends on the transfer method used. You can full read and write small or large blocks in one DMA transaction, but you might have to choose different methods, flags, etc... There are some weird cases. That said, I regularly transfer, 1, 2, or 3 or more bytes using DMA all the time in the examples. The trickiest part is managing the dummy bits. You will see in my transfer structures that I manually account for them (and ignore the data). This was the simplest solution but you end up needing two ways: one way for full duplex and another for half duplex. This is another reason though, why it is best to bundle up as much of the data into one buffer to transfer as possible and avoid sending separate DMA transactions of 1, 2, or 3 bytes if it can be avoided.

I am not seeing that part, I see that SPI transaction but not how it turns into DMA.

Basically, the call to spi_device_queue_trans() (part of SPI Master) is what transfers the transaction to the ESP32 for it to initiate the DMA transfer. After that, the callbacks, etc.. happen once the DMA request completes. There are also two non-DMA SPI Master functions for sending SPI data, spi_device_polling_transmit(), and spi_device_transmit(). Polling uses DMA behind the scenes but wait for completion before returning. You can mix and match methods but you have to be careful to process all returned DMA transaction results first.

And that is where your approach and mine collide.

Yep. In the LVGL case, since all it wants is a big buffered bitmap display, this was already part of the LVGL library - it already uses multiple rendering buffers so it was easy to make it work with multiple queued DMA requests. There are also very few FT8xx command sent, mostly at start up, since none of the FT8xx features area really used - so I just send them synchronously and take the hit when needed.

In my first example however, my DMA method has one 512 byte DMA buffer that is used and reset on each SPI request. This is why there is a blocking call, since it would get overwritten otherwise on the next request while it may still be used in the DMA queue. If I wanted to make it work with multiple queued DMA transactions I have two thoughts:

  1. Use a separate unique DMA buffer per transaction and allocate and free them as needed (or use a re-usable pool). Many small allocations though could create heap fragmentation so it's not ideal. But maybe a pool with a few different sizes of buffers would work.

  2. Use some sort of circular DMA buffer and only block to retrieve results to free up space for more requests when the buffer is full. Woudn't be truly circular since DMA doesn't understand circular, but you can probably get the picture - one larger buffer with sections used for multiple requests. You just have to keep track of free vs used space - sort of a mini dedicated memory pool that is set aside for DMA transaction buffering.

One other thing I will point out about the ESP32 and DMA, since these transactions happen asynchronously, you don't get to control the CS line directly - the SPI Master and DMA handling need to control that for you. Each transaction will work the CS line when it is processed, in the background. This makes sense, because these happen in the DMA hardware itself and also happens while your other code is also running. So you tell SPI Master which pin is CS and how to manage it when setting up the SPI bus. This happens when filling out the spi_device_interface_config_t structure inthe call to disp_spi_add_device_config().

You also likely won't be able to call EVE_cs_clear() or any other code that talks to hardware in the callback either as it is an IRAM_ATTR restricted callback. I.e. in runs in an interrupt handler frame. You can't even call the SPI Master APIs without faulting. You can only do very basic stuff like setting flags and calling small bits of your own code. Most of the RTOS APIs are off limits and will fault if you try calling them. Really, it is an interrupt handler so only minimal code can execute. Makes things more complicated for sure.

robouden commented 3 years ago

Rudolph,

Thanks for all the work!!

I downloaded the new version and complied it. I did encounter an error about

_"EVE_TOUCH_RZTHRESH' was not declared in this scope EVE_memWrite16(REG_TOUCH_RZTHRESH, EVE_TOUCHRZTHRESH); / eliminate any false touches /"

After uncommenting it the code complied fine.

And bingo!!! Work s nice and fast. Love it. Time1:304uSec and Time2 92us. Display seems a bit off, but that should be easy to fix.

Very happy with this progress. I will inform RiverDi that this way we can order the custom IOT7 After some more tests!!

Regards, rob

robouden commented 3 years ago

And the picture (worth 1000 words or code lines) S1980002

RudolphRiedel commented 3 years ago

Nice to see it starting to work. :-) The EVE_RiTFT50 profile is missing this line:

define EVE_TOUCH_RZTHRESH (1200L)

And the offset means that the display-parameters are wrong, either Riverdi is using a different panel or the RiTFT50 would show the same result. Well, that is what "untested" means, I try to but for solely practical reason can not but them all. :-)

Just guessing, try to change the profile in EVE_config.h to this: ` / RVT50xQBxxxxx 800x480 5.0" Riverdi, various options, BT815/BT816 /

if defined (EVE_RiTFT50)

define EVE_HSIZE (800L) / Thd Length of visible part of line (in PCLKs) - display width /

define EVE_VSIZE (480L) / Tvd Number of visible lines (in lines) - display height /

define EVE_VSYNC0 (0L) / Tvf Vertical Front Porch /

define EVE_VSYNC1 (10L) / Tvf + Tvp Vertical Front Porch plus Vsync Pulse width /

define EVE_VOFFSET (23L) / Tvf + Tvp + Tvb Number of non-visible lines (in lines) /

define EVE_VCYCLE (525L) / Tv Total number of lines (visible and non-visible) (in lines) /

define EVE_HSYNC0 (0L) / Thf Horizontal Front Porch /

define EVE_HSYNC1 (10L) / Thf + Thp Horizontal Front Porch plus Hsync Pulse width /

define EVE_HOFFSET (46L) / Thf + Thp + Thb Length of non-visible part of line (in PCLK cycles) /

define EVE_HCYCLE (1056L) / Th Total length of line (visible and non-visible) (in PCLKs) /

define EVE_PCLK (2L)

define EVE_PCLKPOL (1L)

define EVE_SWIZZLE (0L)

define EVE_CSPREAD (1L)

define EVE_TOUCH_RZTHRESH (1200L)

define EVE_HAS_CRYSTAL

define EVE_GEN 3

endif

`

RudolphRiedel commented 3 years ago

So this code uses the SPI APIs as provided by the ESP32 IDF toolkit (i.e. the native C code SDK for the ESP32).

I need to have a closer look at this after work. The Arduino core arduinoespressif32 is using files from Espressif in cores/esp32 and the one I was looking at is "esp32-hal-spi.c". This one does not have a function spi_device_queue_trans() or even "queu". But given the "Copyright 2015-2016 Espressif Systems" it might be that they are just holding onto outdated files.

robouden commented 3 years ago

Seems an offset issue. I tried with you recommended settings(see picture below), but that did not change anything.

The code from https://github.com/riverdi/riverdi-eve-arduino get the screen size right.

Regards, rob

S1980003

RudolphRrr commented 3 years ago

I can't spend any time now but had to comment so I created a second account. :-) This looks like you did not actually use the new set of defines but still the old ones. The parameters in here: https://github.com/riverdi/riverdi-eve-arduino/blob/master/Riverdi_Modules.h Are exactly the same as above, compare CTP_50.

Edit: from the datasheets both the RVT50UQENWC01 and RVT50UQBNWC01 use the same timing parameters. And the parameters given do match with the original parameters in my library. However, Riverdi is using a different set of parameters in both their library and in their online documentation: grafik

And their recommended values is a match for the second set of parameters I provided above.

But I missed something, there already is a profile that matches this, try to switch over to EVE_RiTFT70.

I may just order a RVT50UQBNWC01 when I am back at home.

robouden commented 3 years ago

Seems exactly the same. Not sure why the offset is there. Yes, that would be great, much easier, if you have a unit to test.

Regards, rob .

RudolphRrr commented 3 years ago

I ordered one yesterday, I just received the tracking information.

RudolphRiedel commented 3 years ago

I received the RVT50UQBNWC01 and just tried it out with the EVE_RiTFT50 parameter set from the EVE_config.h I pushed yesterday and which is nothing but an alias for the EVE_RiTFT70 parameter set that already was in. And it just works, no offset or anything. :-) It still has to go faster than the 304µs I am seeing now.

RudolphRiedel commented 3 years ago

Somehow the photos I am taking with my smartphone are not getting any better. grafik

robouden commented 3 years ago

I will give it a test again..Maybe something wrong with my setup in VSCodium.

rob

RudolphRiedel commented 3 years ago

.Maybe something wrong with my setup in VSCodium.

I just installed VSCodium to give it a spin but it won't let me install PlatformIO, it just is not available.

Anyways, I just pushed an update that includes what I have done for ESP32. And I have to admit, I gave up on using DMA after searching for a solution. There are third party libraries that may or may not allow SPI-DMA for Arduino-ESP32 but I am not comfortable with binding my code to third party libraries. Of course anyone willing to explore this can do it for themselves.

After a cleanup I was a able to put the EVE_init_dma() (empty) and EVE_start_dma_transfer() functions in EVE_target.c(pp) - no idea what I did wrong when I first tried it, it refused to compile this way and now it does.

And it looks like this now: `

void EVE_start_dma_transfer(void) { SPI.setClockDivider(0x00002002); / write only, go faster: use Apb clock of 80MHz and divide by 3 -> 26,667MHz / EVE_cs_set(); SPI.writeBytes(((uint8_t ) &EVE_dma_buffer[0])+1, ((EVE_dma_buffer_index)4)-1); EVE_cs_clear(); SPI.setClockDivider(0x00004004); / read/write, go slower: use Apb clock of 80MHz and divide by 5 -> 16MHz / }

`

So what I did to go faster than 300µs was to increase the SPI clock. Unfortunately the ESP32 has the same limitations that most other controllers have as well, you can only divide the clock by 1, 2, 3 and so on and not by 1.4 or 3.6. So on the uppper end you get 40MHz, 26,667MHz, 20MHz, 16MHz. grafik

I went with 26,667MHz to be on the safe side for the BT815, but although out of spec, 40MHz is working here as well. Of course using a different board can make a huge difference in what is working and what is not.

With DMA that should be near 80µs and opposed to how this works now, a constant 80µs. And with DMA it would be possible to lower the SPI clock.

Lowering the SPI clock after the transfer is necessary as MISO is getting shifted by the delays in the lines along the way. SCK and MOSI arrive at the same time at the BT815, regardless of what level-shifter for example is added but MISO then has to go all the way back so it is affected by both the delay on the SCK line and the delay on its own line. This does have an effect on Time2 but as Time2 is for TFT_touch() it is more of a static time budget, it does not depend on the display list but more on the number of touch-points and whatever calculations are placed in this function. I just put reading the other four touch-points in TFT_touch() and Time2 went up from 85µs to 250µs for the ESP32. This is rather slow in comparision, the MetroM4 for example running on Arduino as well but with no measures in place to speed things up so it runs on 8MHz, it does run TFT_touch() with reading 5 touch-points in 55µs. But the impact from adding things to TFT_touch() is rather low as opposed to adding more things to display in TFT_display().

robouden commented 3 years ago

Rudolph,

Thanks for the test. To get PlatformIO in VSCodium working change the market store in
product.json file with :

"extensionsGallery": { "serviceUrl": "https://marketplace.visualstudio.com/_apis/public/gallery", "itemUrl": "https://marketplace.visualstudio.com/items" },

Restart VSCodium and you can add PlatformIO in the extensions. This is an issue with MS and VSCodium group.:)

regards rob

RudolphRiedel commented 3 years ago

Thanks, this does work better but it won't let me display the installed addons. This is way off-topic though. :-)

Anyways, I just built and uploaded it with VSCodium and it just works as well.

robouden commented 3 years ago

Rudolph,

Maybe I need to adjust the calibration of the screen?

BTW. do you have any idea how much power the IOT5 used in standby/sleep mode? And if it can wake up with the previous screen by a touch of the panel?

Regards, rob

RudolphRrr commented 3 years ago

Yes, running the calibration at least once is more or less mandatory. You might get away with using stored values for units that were bought together but even than the touch could have an offset.

No, I have no idea how much current the IOT5 is using in any mode. For one I do not have one and then my current setup is supplied thru USB.

And while I have not checked out the sleep modes of the FT81x/BT81x, a wakeup by touch should only be possible with a standby mode with for example only the backlight switched off. There maybe is a way to get the power of the system down using the Interrupt line of the BT815.

So far I got away with the requirement of using a stable 12V supply. :-)

RudolphRiedel commented 3 years ago

I have been trying for a couple of days now to get ESP8266 to work and after not getting anywhere with it I started to clean up things between the Arduino targets and compare what is happening with the same code on the SPI using the different Arduinos. Something is horrible wrong with ESP32.

This is what the first four comands to power up EVE look like using an Arduino UNO: grafik

This is the same using an Metro-M4: grafik

And with the ESP32 I get this: grafik

This is so completely broken that right now I really do have no idea why I see something display on the screen regardless.

I was to use that as a starting point, a frame of reference to get going with the ESP8266. But now I am back at wondering what the heck is going on with the ESP32.

Edit: I did wrote "first three commands" above when there are really four. This is because there are supposed to be three and I did fail to count. :-) The first one in the images is a software-reset command I put in in hope to make ESP8266 work but removing it does not change anything with the SPI of the ESP32 apparently failing.

RudolphRiedel commented 3 years ago

Okay ESP32 is looking good now as well: grafik

And somehow it got a little faster with the refacturing I was doing in EVE_target.h to improve the readability for the different Arduino targets. The simple demo runs at 139µs/74µs now.

dstulken commented 3 years ago

I might have missed something - So the ESP8266 hadn't been tested, and upon testing, didn't work? But in the process of looking over the code to capture an ESP32 baseline for comparison, you managed to make it (the ESP32 version) ~8% faster? 😄

RudolphRiedel commented 3 years ago

That it is a little faster was not so much intentional but either a side-effect of that I split everything up for the various Arduino targets into their own sections to improve the readability of these sections. Or it goes a little faster because I fixed the SPI. The Arduino Uno got a little faster as well but this was intentional. :-)

DMA for the ESP32 also is not completely off the table.

I had the ESP8266 up and running a couple of years ago but never really used it. I can not even remember when or where I bought the D1 Mini I am trying to use now.

robouden commented 3 years ago

Rudolph,

Thanks for all the teasing to get the issues solved!!

Regards, Rob

RudolphRiedel commented 3 years ago

How am I teasing when I pushed it some 13 hours ago? :-)

Well, the ESP8266 is still not working for me, although it really should, maybe my D1 Mini is broken, maybe there is a hardware issue that I did not figure out so far. I ordered a new one.

And regarding DMA for the ESP32 I only got a very small step closer before I was distracted with the ESP8266. While it has been confirmed that the Arduino-ESP32 does not directly support DMA for SPI now, it also comes with the ESP-IDF builtin, not as source-code though but in form of a pre-compiled library. https://github.com/espressif/arduino-esp32/blob/master/tools/sdk/lib/libdriver.a https://github.com/espressif/arduino-esp32/blob/master/tools/sdk/include/driver/driver/spi_master.h This is the reason why I did not find it before and it looked like I had to resort to some third-party solution. For now it is up and running without DMA and at least it does so with less than the 30% of the time-budget the current Arduino Uno variant is using. I believe the ESP32 can do better but it's not like this is unuseable. And I am certain that my implementation outperforms https://github.com/zerynth/lib-bridgetek-bt81x for example.

Then there is the project I really should be doing some work on right now and from which I was distracted with the ESP32. :-)

robouden commented 3 years ago

Rudolph,

Sorry for my long silence. I have been doing work. Picking up the project again, and I wanted to know how to switch on/off the background lighting to preserve power with still being able to have the IOT5 wake up from the touch panel.

Can I just use "EVE_memWrite8(REG_PWM_DUTY, 0x0);" for switching off the backlighting and "EVE_memWrite8(REG_PWM_DUTY, 0x80);" to switch on the backlighting?

Regards, Rob Oudendijk

RudolphRiedel commented 3 years ago

Yes, that is working. I just modified my program to write 0x00 to REG_PWM_DUTY when the button ist pressed. And to write 0x80 when the button is pressed again. This way I currently have 209mA from 7.1V on my external PSU I am supplying the display with. And this drops to 57mA when the backlight is off. There is a switching regulator on my display-adapter and I have the backlight supplied from the same 3.3V that also is used for the logic now. So for 3.3V this means the current drops from 450mA to 123mA. This means the backlight is using a little over 1W and when supplying it with 5V it should draw around 215mA.

I usually do not set the backlight to full brightness though and if I needed to bring down power down this would only be my first measure. The next step would be using PD and INT but my adapters do not even connect the INT line as I had no use for it so far.

robouden commented 3 years ago

Rudolph,

Thanks for the reply and the power tests. You have the code you use for testing online?

Regards, rob

RudolphRiedel commented 3 years ago

Yes and I just checked it with Meld, https://github.com/RudolphRiedel/FT800-FT813/tree/5.x/example_projects/EVE_Test_Arduino_PlatformIO has exactly the same code as I have in my development folder and which I used to test with.

I only modified TFT_touch() like this for the test:

case 10: / use button on top as on/off radio-switch / if(toggle_lock == 0) { toggle_lock = 42; if(toggle_state == 0) { toggle_state = EVE_OPT_FLAT; EVE_memWrite8(REG_PWM_DUTY, 0x00); } else { toggle_state = 0; EVE_memWrite8(REG_PWM_DUTY, 0x80); } }

And you can check with a flashlight that the display still is fully operational when the backlight is off.

robouden commented 3 years ago

Rudolph,

Thanks for sharing your findings/setup.

regards, rob

RudolphRiedel commented 3 years ago

Over the last two days I tried to implement ESP-IDF SPI functions into the Arduino side. And the result was that the ESP32 crashes after sending the very first byte.

So I cut out what I implemented and put it in a much simpler form: ESP32_SPI_Test.zip

There are three blocks with different transfers in loop() and I would like to use all three of them. Each block works when used alone. The third block using DMA is doing fine on its own including the callback function. The first and the second block work together.

When I combine the first or the second block with the third block the ESP32 crashes and resets.

Also, checking and rechecking this with the logic-analyzer showed that ESP-IDF is even slower than Arduino-ESP32.

The first block ist doing this: EVE_cs_set(); spi_device_transmit(EVE_spi_device_simple, &trans); EVE_cs_clear();

And the time for transferring a single byte on the SPI between CS low and CS high is 38.4µs. At 10Mhz the time for actually transferring the byte is 760ns, the rest is overhead. Sure, the EVE_cs_clear() is using a little from the 16.7µs after the transfer but not so much really. And the 20.7µs before the transfer are all the driver itself.

The second block is doing this: EVE_cs_set(); spi_device_polling_transmit(EVE_spi_device_simple, &trans3); spi_device_polling_transmit(EVE_spi_device_simple, &trans3); EVE_cs_clear();

And it takes 28.9µs to transfers the 8 bytes. With 7.5µs before the first transfer actually starts, 11.2µs in between and 3.6µs after the second transfer.

When I change the second block to this: EVE_cs_set(); spi_device_transmit(EVE_spi_device_simple, &trans3); spi_device_transmit(EVE_spi_device_simple, &trans3); EVE_cs_clear();

It even gets worse. This is using DMA with waiting for completion. And yes, I tried to increase devcfg.queue_size but it makes no difference whatsoever.

Now the total is 64.8µs. 15.6µs before the first transfer starts, 29µs in between and 13.4µs after the second transfer.

Sorry, but I have to pull the plug on this one. At this point I consider this to be broken by design. The SPI driver is doing way too many things considering that SPI inherently is not a shareable bus. And the way DMA works is not by telling the DMA unit to transfer the data but to ask the RTOS to que a request.

Considering what I learned now, the Arduino-ESP32 SPI implementation actually is not that bad. Although it still is very slow considering the core clock of 240MHz and that the controller supports DMA.

robouden commented 3 years ago

Rudolph,

Works great. On my modified code (changed the menu buttons and background and logo etc. in tft.cpp) , I got 198mA in "no display" mode instead of the 419mA in "normal" mode.

Regards, rob

Regards, Rob Oudendijk Yuka Hayashi http://yr-design.biz http://oudendijk.biz http://about.me/robouden tel +81 80-22605966 Skype: robouden Facebook:robouden http://on.fb.me/QeKw2P linkedin:robouden https://www.linkedin.com/in/roboudendijk

On Mon, Dec 21, 2020 at 7:41 PM RudolphRiedel notifications@github.com wrote:

Yes and I just checked it with Meld, https://github.com/RudolphRiedel/FT800-FT813/tree/5.x/example_projects/EVE_Test_Arduino_PlatformIO has exactly the same code as I have in my development folder and which I used to test with.

I only modified TFT_touch() like this for the test:

case 10: / use button on top as on/off radio-switch / if(toggle_lock == 0) { toggle_lock = 42; if(toggle_state == 0) { toggle_state = EVE_OPT_FLAT; EVE_memWrite8(REG_PWM_DUTY, 0x00); } else { toggle_state = 0; EVE_memWrite8(REG_PWM_DUTY, 0x80); } }

And you can check with a flashlight that the display still is fully operational when the backlight is off.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/RudolphRiedel/FT800-FT813/issues/14#issuecomment-748904347, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKSVNWD46TTUCKGQWU3UTTSV4Q3JANCNFSM4T4EYKAQ .

RudolphRiedel commented 3 years ago

Well, the next step would be to use STANDBY, SLEEP or POWERDOWN:

4.8.8 Touch Detection in none-ACTIVE State When the BT815/6 is in none-ACTIVE state, a touch event can still be detected and reported to the host through the INT_N pin. In other words, a touch event can wake-up the host if needed.

For capacitive touch, the INT_N pin will follow CTP_INT_N pin when the BT815 is in STANDBY, SLEEP or POWERDOWN state.

Operating Current: 22mA Standby Current: 3mA Sleep Currrent: 0.6mA PowerDown Current: 0.2mA

Well okay, the touch-chip has to be active, no idea what it draws.

Now find a way to put the ESP32 to sleep and to wake it up with the INT pin. :-)