ROBOTIS-GIT / OpenCR

Software for ROS Embedded board (a.k.a. OpenCR). OpenCR means Open-source Control Module for ROS.
Apache License 2.0
384 stars 238 forks source link

SPI Transaction support plus... #91

Closed KurtE closed 6 years ago

KurtE commented 6 years ago

Not sure if, best to do this as issue? Or discussion some place, but was wondering about the SPI support, so took a quick look through SPI implementation and have a few suggestions.

I might later try another Pull Request, but currently don't have enough information.

I noticed that you do have beginTransaction and endTransaction defined in the library but you have not told the world that you have it.

That is I believe most all SPI libraries that support transactions have:

// SPI_HAS_TRANSACTION means SPI has beginTransaction(), endTransaction(),
// usingInterrupt(), and SPISetting(clock, bitOrder, dataMode)
#define SPI_HAS_TRANSACTION 1

You will find there are a reasonable number of libraries that test for this and then use the begin/end stuff.

Obviously would be easy to do simple PR to do this.

However, I believe that this should probably be enhanced some before doing so. That is currently your beginTransaction, is simply ALWAYS calling off to set everything.

Instead I believe it should at a minimum do something like: if new settings do not equal current settings, than do all of the init: Example Teensy

    void beginTransaction(SPISettings settings) {
... (Stuff for handling options for interrupts )
        if (port().CTAR0 != settings.ctar) {
            port().MCR = SPI_MCR_MDIS | SPI_MCR_HALT | SPI_MCR_PCSIS(0x3F);
            port().CTAR0 = settings.ctar;
            port().CTAR1 = settings.ctar| SPI_CTAR_FMSZ(8);
            port().MCR = SPI_MCR_MSTR | SPI_MCR_PCSIS(0x3F);
        }
    }
...

Again this checks the generated CTAR register value (Teensy hardware specific) and only when necessary update it... Could maybe do somethings similar for here, but I don't know your hardware... Is there hardware manual showing some of this?

There is also stuff about the clock handling: Right now you use the setClockDivider(settings.clockDiv);

So you currently take the desired clock rate, example 4000000 and in the function:
SPISettings::init_AlwaysInline you have code that divides the desired speed by some fixed dividers like 2, 4, 8, 16... and return some value like SPI_CLOCK_DIV4... (as 50000000/2 = 25000000 which > 2000000 Now as hopefully always inline will remove all of #if for constants being passed in... But maybe not. On Teensy the code checks for built in constants before doing the stuff always inline.

''' SPISettings(uint32_t clock, uint8_t bitOrder, uint8_t dataMode) { if (__builtin_constant_p(clock)) { init_AlwaysInline(clock, bitOrder, dataMode); } else { init_MightInline(clock, bitOrder, dataMode); } } '''

Then when you do the begin transaction the call to setClockDivider(SPI_CLOCK_DIV4) will then do case statement on the value and set... _hspi->Init.BaudRatePrescaler = SPI_BAUDRATEPRESCALER_4 Where #define SPI_BAUDRATEPRESCALER_4 ((uint32_t)0x00000008)

At a minimum I would believe that the SpiSettings object could/should hold onto the generated pre-scaler value and set it directly and not need the indirection. But next question might be are these the only valid values for SPI speed (25mhz, 12.5, 6.25...?)

Again I only took a quick look through here. If I actually were to try more of it out, I would probably have additional questions like: Is there anything speed wise between your normal SPI where you are in handshake mode for each byte/word transferred, That is void SPIClass::transfer(void *buf, size_t count) still just does a for loop transferring each item one at a time.

With many of the Teensy boards there is a SPI FIFO queue for RX and TX, so that you can keep the SPI busy. With the lower end (LC), it does not have FIFO but does support double buffering.
Does this processor support something like this?

DMA: I see there is a quick transfer, transferFast which I believe is only a write operation using DMA?
Again with the Teensy we added some more dma support... https://github.com/PaulStoffregen/SPI

Again not sure what level of support you would like here? But thought I would at least document a few things I noticed.

Maybe if I get a chance I might try doing a few changes and see about supporting some other display like the ili9341. It would be curious to see the speed. I would probably start with the Adafruit library, but it would be nice to get faster speed, like we have on the Teensy using the ili9341_t3 library and it's derivatives. I know there are a few DMA versions of this for some platforms.

Kurt

KurtE commented 6 years ago

Secondary Notes:

I thought I would try building the Adafruit_Ili9341 library for the OpenCR.

I first updated my installed version of Adafruits library. It did not compile as it was trying to define a port and mask to do quick pin IO and tries to define volatile RwReg *mosiport, ... Which does not exist. They already handle it to use digitalwrite for their ARDUINO_STM32_FEATHER case. So fixed this part by adding to their #define

#if defined (ARDUINO_STM32_FEATHER) ||  defined(__OPENCR__)   // doesnt work on wiced feather or Robotis __OPENCR__
  #undef USE_FAST_PINIO
#elif defined (__AVR__) || defined(TEENSYDUINO) || defined(ESP8266) || defined (ESP32) || defined(__arm__)
  #define USE_FAST_PINIO
#endif

I added the OPENCR test, which got it to mostly build. However it has an undefined export from GFX. This is because your version of GFX is something like 1.1.5, where current one is: 1.2.3.
It uses your version as you specified the architecture which guess overrides architecture=* in the library.properties file.

Couple obvious ways to fix... I just removed your copy of GFX and it builds. Other options, include update your copy to a more current version.

I have not tried running it yet. But if this is something interesting, probably should see if you can get Adafruit to take a simple Pull request.

KurtE commented 6 years ago

Quick update: I am in the process of trying to debug using an Adafruit ILI9341 display, including defining SPI_HAS_TRANSACTION, so far the code is totally hanging the OpenCR board. It appears to be hanging in a tft.fillScreen call... When it hangs, the board will not accept a new program unless you do the force load (btn 2/reset)...

That is in the standard graphic test program I added a few output messages:

unsigned long testFillScreen() {
  unsigned long start = micros();
  Serial.println("Fill screen black"); Serial.flush();
  tft.fillScreen(ILI9341_BLACK);
  Serial.println("after Fill - before yield"); Serial.flush();
  yield();
  Serial.println("after yield"); Serial.flush();

I get the first fill screen message, but not the after... message.

Looking at the SPI output: screenshot Sorry I know at this scale does not show anything, except, looking at the sequence it looks like it has the proper output for a screen fill. Setting the horizontal/vertical ranges and then memory fill, followed by lots of writes of 0 (black)... Did not count the 320x240x2 bytes output... But it then properly shows that the CS is released, which is probably the last thing left before the functions should have returned from the fillScreen. Maybe memory corruption? maybe timing issue? to long in these outputs Will debug some more. Also maybe will do a little more editing of the Adafruit code to maybe say that we support 16 bit outputs...

Once working still wondering about speeding up... That is can we reduce the amount of SPI unused space. That is if you look at the set one range... screenshot The SPI data is only taking up about 20% of the time...

But first to figure out why it is hanging

OpusK commented 6 years ago

Again this checks the generated CTAR register value (Teensy hardware specific) and only when necessary update it... Could maybe do somethings similar for here, but I don't know your hardware... Is there hardware manual showing some of this?

If CTAR means Clock and Transfer Attributes, similar registers are CR1 and CR2. (please refer to Reference Manual)

And the other questions will be answered by the answers below:

First, we were only working with OpenCR to be at least compatible with Arduino. (Especially for Turtlebot3) (This is why non-flexible APIs, such as fixed prescale values.) What you have recently suggested is a positive part of our team. However, this is a task that has been delayed because of its lower priority in company policy.

Therefore, we welcome PR like you do. We will update the adafruit's libraries as you have requested. And if you do PR, we will include it in the release after testing with existing examples (especially Turtlebot).

So, you can PR freely, and if you have anything you need, ask us. Thanks for your contributions :)

OpusK commented 6 years ago

I am in the process of trying to debug using an Adafruit ILI9341 display, including defining SPI_HAS_TRANSACTION, so far the code is totally hanging the OpenCR board. It appears to be hanging in a tft.fillScreen call... When it hangs, the board will not accept a new program unless you do the force load (btn 2/reset)...

Quick Answer : I found that this was posted after answering the previous content. We will reply to you after reviewing this content :)

OpusK commented 6 years ago

@KurtE ,

Is the ILI9341 library you are currently testing the latest version of? In OpenCR, ILI9341 library is partially modified and used as "OpenCR_ILI9341.h".

Obviously, there are differences between the two libraries, so this should be considered. OpenCR_ILI9341 library is included as a compressed file(.zip).

And we updated the GFX library to the latest version (1.2.3).

KurtE commented 6 years ago

Thanks, I missed your version, will unzip it and try it out.

I also just updated my Fork of OpenCR to be up to date

d:\GitHub\OpenCR>git fetch upstream

d:\GitHub\OpenCR>git checkout
Your branch is up-to-date with 'origin/digital-write-fast'.

d:\GitHub\OpenCR>git checkout develop
Switched to branch 'develop'
Your branch is up-to-date with 'origin/develop'.

d:\GitHub\OpenCR>git reset --hard upstream/develop
HEAD is now at 0c8823c Added digitalWriteFast example.(#90)

d:\GitHub\OpenCR>git push origin develop --force
Counting objects: 50, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (50/50), done.
Writing objects: 100% (50/50), 24.20 KiB | 2.42 MiB/s, done.
Total 50 (delta 23), reused 1 (delta 0)
remote: Resolving deltas: 100% (23/23), completed with 15 local objects.
To https://github.com/KurtE/OpenCR.git
   5b33dc6..0c8823c  develop -> develop

And now updated my Arduino version, that is installed at Arduino15... (Used board manager to install). So again copied everything over from my develop branch. Wonder if there is another way to add current stuff as different install... But that is secondary.

Probably soon, get back to Turtle integration with new sensors... My problem is I see lots of fun things to try at this Arduino/board level. Probably at some point should pick up second one of these boards so I can have my Waffle back in one piece and still experiment.

Things like: I would like to add the new sensors, should be easy now. Maybe add using Ros parameters to control things like frequency, boundary heights... I want to get back to understanding the Wire code to get it to work with hardware instead of bitbang... Maybe add display so we can see some information when running remote without having to look at Host display...

Wonder about use of timers? Example on Teensy might do most of the stuff you have in loop with Interval Timers? Or what about threads/fibers?

Sorry, I know that this is off of the topic of this issue. Wondering where best to have conversations like this? Robotis forum? RobotSource? Ros Discourse?

Now off to try out your driver

KurtE commented 6 years ago

Quick update: Your driver from zip file appears to not use the hardware SPI, it is doing bit banging and uses other pins... It does not hang... Speed wise, it is interesting that the actual bits are going our slower by a reasonable amount. That is a byte output using hardware SPI took about .554us, where with bit bang it is taking about: 2.77us, but the actual timing given the gaps between bytes. Hardware SPI time between two SPI.transfer() calls is about 2.85us where bit bang: 3.29...

Not sure yet why the Adafruit one I was trying is... I have made a small extract of their code and the like to see if it was simple SPI stuff... So far it is running, without hanging...

Note: I still mostly have to reset the board to download new version... Still looking at what is different. Maybe Stack corrupted? Should probably look at how memory is laid out. I assume maybe Stack at one end of memory and heap at other end? ...

KurtE commented 6 years ago

Update: I know where the hang is... Masked before as Serial.flush() not implemented... Their test program hangs in the function:

unsigned long testFillScreen() {
  unsigned long start = micros();
  tft.fillScreen(ILI9341_BLACK);
  yield();
  tft.fillScreen(ILI9341_RED);
  yield();
  tft.fillScreen(ILI9341_GREEN);
  yield();
  tft.fillScreen(ILI9341_BLUE);
  yield();
  tft.fillScreen(ILI9341_BLACK);
  yield();
  return micros() - start;
}

The call to yield() kills the system. The yield calls I believe are defaulting to some either thread version of yield or the like. On several systems yield is implemented to allow for some other low level stuff to be called.

Example on Teensy: yield() includes code to check for SerialX available on all of the Serial ports and if appropriate call the SeralXEvent... sort of like it calls off to your SerialEventRun function.

I was able to get the app to run farther by putting in a dummy

void yield() {
}

Now to figure out why output is not showing up correctly

Update: I do have output now... Rather slow, but at starting point:

ILI9341 Test!
After tft begin
Display Power Mode: 0x94
MADCTL Mode: 0x48
Pixel Format: 0x5
Image Format: 0x80
Self Diagnostic: 0xC0
Benchmark                Time (microseconds)
Benchmark                Time (microseconds)
Screen fill              2738275
Text                     163891
Lines                    1613858
Horiz/Vert Lines         224850
Rectangles (outline)     144061
Rectangles (filled)      5671753
Circles (filled)         863780
Circles (outline)        708970
Triangles (outline)      362452
Triangles (filled)       1882375
Rounded rects (outline)  311526
Rounded rects (filled)   6172810
Done!
OpusK commented 6 years ago

And now updated my Arduino version, that is installed at Arduino15... (Used board manager to install). So again copied everything over from my develop branch. Wonder if there is another way to add current stuff as different install... But that is secondary.

In this regard, we have thought some method. But I'm not sure how to do it yet, so I'm going to try a bit more. Please let me know if you have a good opinion :)

Wonder about use of timers? Example on Teensy might do most of the stuff you have in loop with Interval Timers? Or what about threads/fibers? Sorry, I know that this is off of the topic of this issue. Wondering where best to have conversations like this? Robotis forum? RobotSource? Ros Discourse?

No, your opinions are very helpful. Personally, I think :

  1. For firmware enhancements: GitHub Issue
  2. To share and debate various opinions: RobotSource

Your driver from zip file appears to not use the hardware SPI, it is doing bit banging and uses other pins...

Examples of compressed file use software SPI. However, with the other annotated constructors, it is designed to use the hardware SPI. This library seems to require administration. I think there is a way to modify the latest library for OpenCR and to supplement existing ones.

The call to yield() kills the system. The yield calls I believe are defaulting to some either thread version of yield or the like. On several systems yield is implemented to allow for some other low level stuff to be called.

We have to figure out what code to include in yield(). First, we'll update the Serial.flush() function for USB.

KurtE commented 6 years ago

Thanks - It is still WIP, but pushed up new branch for SPI transaction support https://github.com/KurtE/OpenCR/tree/SPI-Transactions

It defines the HAS_TRANSACTIONS, plus the beginTransaction code now checks to see if values changed before calling off to change them, which saves time (not needing to do all of the HAL stuff, especially it was doing it 3 times each call to beginTransaction.

The timing from my hacked up Adafruit library is showing some improvements:

ILI9341 Test!
After tft begin
Display Power Mode: 0x94
MADCTL Mode: 0x48
Pixel Format: 0x5
Image Format: 0x80
Self Diagnostic: 0xC0
Benchmark                Time (microseconds)
Screen fill              1432591
Text                     105262
Lines                    1059915
Horiz/Vert Lines         119314
Rectangles (outline)     77171
Rectangles (filled)      2974691
Circles (filled)         501352
Circles (outline)        466576
Triangles (outline)      235594
Triangles (filled)       1016278
Rounded rects (outline)  192687
Rounded rects (filled)   3260486
Done!

Still more to go. Question is to try to keep it with integration with the Adafruit library, maybe getting them to take Pull Request or update your OpenCR version?

OpusK commented 6 years ago

Thanks for your sharing!

In my opinion, it would be better to PR to the Adafruits library than to integrate the ILI9341 library into OpenCR.

And, I added USBSerial.flush() function.

OpusK commented 6 years ago

The problem with yield() is that it does not have a proper definition for the board, but only declarations. In ARM gnu g++, the yield() function used in the thread was defined, which caused problems when using this function.

So, I created a yield.cpp file and defined it as dummy code. I think we need to discuss what to put here.

OpusK commented 6 years ago

And now updated my Arduino version, that is installed at Arduino15... (Used board manager to install). So again copied everything over from my develop branch. Wonder if there is another way to add current stuff as different install... But that is secondary.

Regarding the above, we are still considering various methods, but I introduce a simple method. I think you are a MAC or LINUX user. Therefore, I recommend using "Symbolic link".

  1. Delete OpenCR Package Folder (version folder. eg. "1.1.2")
  2. Create symbolic link.

    $ ln -s "Local Git Path" "Arduino OpenCR Package Folder"

For example,

$ ln -s ~/workspace/OpenCR/arduino/opencr_arduino/opencr/ ~/.Arduino15/packages/OpenCR/hardware/OpenCR/1.1.2

In this case, you must link a folder that matches the version information. Anyway, you can use symbolic links in a variety of ways, and you can use them as a temporary method.

KurtE commented 6 years ago

Yield - hard to say what should go in it. With the default AVR, code it is in hooks.c and it does nothing...

/**
 * Empty yield() hook.
 *
 * This function is intended to be used by library writers to build
 * libraries or sketches that supports cooperative threads.
 *
 * Its defined as a weak symbol and it can be redefined to implement a
 * real cooperative scheduler.
 */
static void __empty() {
    // Empty
}
void yield(void) __attribute__ ((weak, alias("__empty")));

I believe on avr it is only called by default in delay()

void delay(unsigned long ms)
{
    uint32_t start = micros();

    while (ms > 0) {
        yield();
        while ( ms > 0 && (micros() - start) >= 1000) {
            ms--;
            start += 1000;
        }
    }
}

On Teensy Paul put calls in other delay loops as well, like in analogRead, also in main, to replace the calls right after loop is called. Waiting for Serial output when queue is full, ... He also has it doing some work, like calls for serialEvent...

Which is right? good arguments both ways. Example having yield call off with things like: if (Serial1.available()) serialEvent1();

Now makes Serial Event code not completely useless. As it checks for data when the program is just sitting around in delay loops. But if your program is not set up to properly handle it, ... This also puts more overhead on system, as all of those calls to SerialX.available() SerialEventX calls add up. Especially since for example T3.5 and T3.6 have 6 Uarts.... Especially if you don't actually have any SerialEvent functions.


ILI9341 and SPI library updates

I am still playing. I need to see what I have changed and how much I can improve the integration/speed.
Example: ESP32 and ESP... has a function for SPI.writePixels(c, l)... for doing output only for pixels... I think I might be able to emulate using the transferFast function you have... Although maybe not yet... As pixels are 16 bit writes, question is can a transferFast16 be done?

Directly run using develop branch

I am mostly using windows 10, except when I am directly doing ROS stuff that needs the host to be running ROS. But will look at trying out the windows symbolic links to see how well they work: https://www.howtogeek.com/howto/16226/complete-guide-to-symbolic-links-symlinks-on-windows-or-linux/

Thanks

KurtE commented 6 years ago

I created a Pull Request: https://github.com/ROBOTIS-GIT/OpenCR/pull/94

A lot more could/should be done to help support things.
Things like:

transfer16: is faster, but still does a transfer of 2 bytes (still 8 bit mode) so it takes value packs into buffer, transfers, unpacks and builds result. Could probably instead just change the SPIx_CR2 register BR value to 16 bits mode (actually can change in interface level I am pretty sure) _hspi->Init.DataSize

transfer(buffer, cnt) just does a loop, Could probably setup buffer to do transfer faster... Assume it would work to set rx and tx buffer the same...

On Teensy, I created transfer(txbuffer, rxbuffer, cnt) version that allowed different buffers. I believe this was also on Edison and ...

Your transferFast should probably be renamed writeFast as it is output only. Also while it appears to use DMA, it still waits until transfer completes until it returns. Would be interesting to compare it's actual speed to just using HAL_SPI_Transmit(_hspi, buffer, count, timeout)

Again lots of things that we can tryout over time, but maybe this first pass is sufficient for now.

KurtE commented 6 years ago

Not sure what your thoughts are on adding additional write functions that are the same as the ones in ESP32 and ESP8266, but I went ahead and added them (write, write16, write32 and a hacked up version of writePixels, which uses internal 64 byte buffer, and reorders up to 32 pixels at a time and does up to 64 byte writes to SPI...

It cleaned up a few things in the Adafruit library, as could simply define to use these. Good news is before the changes, the graphic test took:

ILI9341 Test!
Display Power Mode: 0x94
MADCTL Mode: 0x48
Pixel Format: 0x5
Image Format: 0x80
Self Diagnostic: 0xC0
Benchmark                Time (microseconds)
Screen fill              1388850
Text                     99537
Lines                    1006464
Horiz/Vert Lines         115464
Rectangles (outline)     74206
Rectangles (filled)      2889139
Circles (filled)         477553
Circles (outline)        444554
Triangles (outline)      223946
Triangles (filled)       1008480
Rounded rects (outline)  186427
Rounded rects (filled)   3154042
Done!

After the changes:

ILI9341 Test!
Display Power Mode: 0x94
MADCTL Mode: 0x48
Pixel Format: 0x5
Image Format: 0x80
Self Diagnostic: 0xC0
Benchmark                Time (microseconds)
Screen fill              500035
Text                     60540
Lines                    641492
Horiz/Vert Lines         42652
Rectangles (outline)     28526
Rectangles (filled)      1038286
Circles (filled)         229836
Circles (outline)        282475
Triangles (outline)      139877
Triangles (filled)       382865
Rounded rects (outline)  103269
Rounded rects (filled)   1154871
Done!

Question is are you OK with these changes and if so, should I squash the changes into one commit?

OpusK commented 6 years ago

Which is right? good arguments both ways.

I agree with your opinion. We will keep it empty as it is now, and we will refer to Teensy's code if necessary later.

Your transferFast should probably be renamed writeFast as it is output only. Also while it appears to use DMA, it still waits until transfer completes until it returns. Would be interesting to compare it's actual speed to just using HAL_SPI_Transmit(_hspi, buffer, count, timeout)

Okay, I agree. As far as this is concerned, I will look at what these APIs are already using, and take care of this issue.

Good news is before the changes, the graphic test took:

Cool!! Thank you for sharing your detail test results.

Question is are you OK with these changes and if so, should I squash the changes into one commit?

Sure! With regard to the ILI9341 library, will you Pull Request to Adafruits?

KurtE commented 6 years ago

Yes I will issue a PR to Adafruit. Will see how it goes. Once I put it in place it might help to have you and or someone else from Robotis comment on it, to get them to take it.

OpusK commented 6 years ago

OK, thanks :)

KurtE commented 6 years ago

I put in PR to Adafruit: https://github.com/adafruit/Adafruit_ILI9341/pull/36

It shows as conflicts, but it is simple #if defined changes in two files.

OpusK commented 6 years ago

Thanks, @KurtE.