Read multiple channels using the same SPI transaction.

alx-uta commented 1 year ago

I had a bit of free time, and I decided to see if I could read the data faster. This was mainly because I have 2x MCP23S17 and 2x MCP3208 on my PCB.

It looks like if we'll read all the channels inside the same SPI transaction the data read is faster in most of the cases. If possible, can you please have a look and see if this is something you'd add to the library?

I'll be more than happy to do any adjustments, mainly because my programming skills in C/C++ are a bit limited.

Test results:

1000000
mcp28.analogRead()  8x:     389
mcp28.analogReadMultiple()  8x:     282
mcp28.differentialRead() 8x:    356
mcp28.deltaRead()   8x:     709

2000000
mcp28.analogRead()  8x:     242
mcp28.analogReadMultiple()  8x:     155
mcp28.differentialRead() 8x:    238
mcp28.deltaRead()   8x:     476

3000000
mcp28.analogRead()  8x:     266
mcp28.analogReadMultiple()  8x:     117
mcp28.differentialRead() 8x:    201
mcp28.deltaRead()   8x:     400

4000000
mcp28.analogRead()  8x:     186
mcp28.analogReadMultiple()  8x:     99
mcp28.differentialRead() 8x:    183
mcp28.deltaRead()   8x:     364

5000000
mcp28.analogRead()  8x:     173
mcp28.analogReadMultiple()  8x:     86
mcp28.differentialRead() 8x:    170
mcp28.deltaRead()   8x:     339

6000000
mcp28.analogRead()  8x:     228
mcp28.analogReadMultiple()  8x:     81
mcp28.differentialRead() 8x:    165
mcp28.deltaRead()   8x:     328

7000000
mcp28.analogRead()  8x:     223
mcp28.analogReadMultiple()  8x:     75
mcp28.differentialRead() 8x:    159
mcp28.deltaRead()   8x:     317

8000000
mcp28.analogRead()  8x:     157
mcp28.analogReadMultiple()  8x:     70
mcp28.differentialRead() 8x:    154
mcp28.deltaRead()   8x:     307

9000000
mcp28.analogRead()  8x:     213
mcp28.analogReadMultiple()  8x:     66
mcp28.differentialRead() 8x:    150
mcp28.deltaRead()   8x:     299

10000000
mcp28.analogRead()  8x:     152
mcp28.analogReadMultiple()  8x:     65
mcp28.differentialRead() 8x:    148
mcp28.deltaRead()   8x:     296

11000000
mcp28.analogRead()  8x:     212
mcp28.analogReadMultiple()  8x:     65
mcp28.differentialRead() 8x:    148
mcp28.deltaRead()   8x:     296

12000000
mcp28.analogRead()  8x:     208
mcp28.analogReadMultiple()  8x:     61
mcp28.differentialRead() 8x:    145
mcp28.deltaRead()   8x:     289

13000000
mcp28.analogRead()  8x:     208
mcp28.analogReadMultiple()  8x:     61
mcp28.differentialRead() 8x:    145
mcp28.deltaRead()   8x:     289

14000000
mcp28.analogRead()  8x:     206
mcp28.analogReadMultiple()  8x:     59
mcp28.differentialRead() 8x:    143
mcp28.deltaRead()   8x:     285

15000000
mcp28.analogRead()  8x:     206
mcp28.analogReadMultiple()  8x:     59
mcp28.differentialRead() 8x:    143
mcp28.deltaRead()   8x:     285

16000000
mcp28.analogRead()  8x:     145
mcp28.analogReadMultiple()  8x:     56
mcp28.differentialRead() 8x:    139
mcp28.deltaRead()   8x:     278

RobTillaart commented 1 year ago

is this toggle needed? as you are still in the same SPI transaction, I assume it is not.

    // Toggle CS only if there are more channels to read
    if (i < numChannels - 1) {
      digitalWrite(_select, LOW);
      digitalWrite(_select, HIGH);
    }

(update) if it is needed, it could be unconditional?

RobTillaart commented 1 year ago

What is the performance loss, if readADC() is implemented as a wrapper around readADCMulti() (core code is identical so expect it is not very much)

RobTillaart commented 1 year ago

I'll be more than happy to do any adjustments, mainly because my programming skills in C/C++ are a bit limited.

Thanks for this enhancement. Good opportunity to improve your skills :) Reading all the channels at once is faster (I know), and it might be interesting to add it.

First thing is please add in the demo sketch the ratio of improvement. ==> time (1.0*analogRead) / (1.0*time multiRead()) Looking at the output this varies roughly from a 0.25 - 3 which makes me suspicious as I would expect roughly a constant factor.

Have to review the code. you're using ESP32?

(pending post, too much windows open error :)

RobTillaart commented 1 year ago

Something is pretty interesting in your log.

Some faster SPI speeds are slower in analogRead(), it seems to happen when speed is not a whole divider of FCPU.

alx-uta commented 1 year ago

is this toggle needed? as you are still in the same SPI transaction, I assume it is not.
    // Toggle CS only if there are more channels to read
    if (i < numChannels - 1) {
      digitalWrite(_select, LOW);
      digitalWrite(_select, HIGH);
    }
(update) if it is needed, it could be unconditional?

We need to toggle the CS line, if not it won't read the next channel :(. I only added the condition so we won't do a toggle unless is required.

alx-uta commented 1 year ago

readADCMulti

I'll have a look and I'll run a few tests.

RobTillaart commented 1 year ago

I only added the condition so we won't do a toggle unless is required.

but as it is required for all channels you want to read, so it could be done unconditionally.

alx-uta commented 1 year ago

I'll be more than happy to do any adjustments, mainly because my programming skills in C/C++ are a bit limited.

Thanks for this enhancement. Good opportunity to improve your skills :) Reading all the channels at once is faster (I know), and it might be interesting to add it.

First thing is please add in the demo sketch the ratio of improvement. ==> time (1.0_analogRead) / (1.0_time multiRead()) Looking at the output this varies roughly from a 0.25 - 3 which makes me suspicious as I would expect roughly a constant factor.

Have to review the code. you're using ESP32?

(pending post, too much windows open error :)

Thanks. I'm mainly a Python/PHP dev, and I only used C/C++ with Arduino and ESP32.

I'll try to update it later today, if not it will be during the week.

Yes, I'm using ESP32 WROVER.

RobTillaart commented 1 year ago

variant for the toggle, at begin and at end of loop.

void MCP_ADC::readADCMultiple(uint8_t channels[], uint8_t numChannels, int16_t readings[]) {

  if (_hwSPI) {
    mySPI->beginTransaction(_spi_settings);
  }

  for (uint8_t i = 0; i < numChannels; i++) {

    digitalWrite(_select, LOW);

    uint8_t data[3] = {0, 0, 0};
    uint8_t bytes = buildRequest(channels[i], true, data);

    if (_hwSPI) {
      for (uint8_t b = 0; b < bytes; b++) {
        data[b] = mySPI->transfer(data[b]);
      }
    } else {
      for (uint8_t b = 0; b < bytes; b++) {
        data[b] = swSPI_transfer(data[b]);
      }
    }

    if (bytes == 2) {
      readings[i] = ((256 * data[0] + data[1]) & _maxValue);
    } else {
      readings[i] = ((256 * data[1] + data[2]) & _maxValue);
    }
    digitalWrite(_select, HIGH);
    }
  }

  if (_hwSPI) {
    mySPI->endTransaction();
  }
}

alx-uta commented 1 year ago

I only added the condition so we won't do a toggle unless is required.

but as it is required for all channels you want to read, so it could be done unconditionally.

This is required for all except for the last one, mainly because after the last channel read there's nothing else there to read.

But, if we'll find out that there's no improvement in the read speed even if we have the extra toggle, we could just remove it.

alx-uta commented 1 year ago

variant for the toggle, at begin and at end of loop.

void MCP_ADC::readADCMultiple(uint8_t channels[], uint8_t numChannels, int16_t readings[]) {

  if (_hwSPI) {
    mySPI->beginTransaction(_spi_settings);
  }

  for (uint8_t i = 0; i < numChannels; i++) {

    digitalWrite(_select, LOW);

    uint8_t data[3] = {0, 0, 0};
    uint8_t bytes = buildRequest(channels[i], true, data);

    if (_hwSPI) {
      for (uint8_t b = 0; b < bytes; b++) {
        data[b] = mySPI->transfer(data[b]);
      }
    } else {
      for (uint8_t b = 0; b < bytes; b++) {
        data[b] = swSPI_transfer(data[b]);
      }
    }

    if (bytes == 2) {
      readings[i] = ((256 * data[0] + data[1]) & _maxValue);
    } else {
      readings[i] = ((256 * data[1] + data[2]) & _maxValue);
    }
    digitalWrite(_select, HIGH);
    }
  }

  if (_hwSPI) {
    mySPI->endTransaction();
  }
}

This should work. I don't have that much experience with SPI. Do you think it will be safe to end the spi transaction while the CS line is off?

RobTillaart commented 1 year ago

Do you think it will be safe to end the spi transaction while the CS line is off?

That thought underlies the code so yes. the communication is during the transfer function. But the test with real hardware is the ultimate test.

alx-uta commented 1 year ago

Thanks for looking at this :)

Just did some tests and I added the new results.

Personally, I think that anyone who needs to read more than one channel should use .analogReadMultiple(). It doesn't make sense to read a channel one by one and add extra delay due to the SPI transition start/end.

RobTillaart commented 1 year ago

I think that anyone who needs to read more than one channel should use .analogReadMultiple(). It doesn't make sense to read a channel one by one and add extra delay due to the SPI transition start/end.

Makes sense, e.g when the timestamps should be as simultaneous as possible. The user might have arguments to do individual calls, design/architecture readability erc, so keeping that option is important.

The readDACmultiple()call also allows to read a single channel n times or two channels multiple times alternatingly etc. So it has broader application than at first sight.

RobTillaart commented 1 year ago

The performance numbers are impressive, Might be interesting to add the gain for 2, 3, 4, 5, 6, 7 and 8 channels in the performance sketch. E.g with a fixed SPI speed of 8 MHz.

I expect it will show gain for all cases.

alx-uta commented 1 year ago

The readDACmultiple()call also allows to read a single channel n times or two channels multiple times alternatingly etc. So it has broader application than at first sight.

That's actually a really good point. I never thought of it as doing a double call and get the average reading.

Might be interesting to add the gain for 2, 3.4.5.6, 7 and 8 channels in the performance sketch.

I'll have a look and I'll try to update it

alx-uta commented 1 year ago

This is quite interesting.

mcp28.analogRead()  2:  42
mcp28.analogReadMultiple()  2:  27
analogRead() time / analogReadMultiple() time   1

mcp28.analogRead()  3:  59
mcp28.analogReadMultiple()  3:  34
analogRead() time / analogReadMultiple() time   1

mcp28.analogRead()  4:  78
mcp28.analogReadMultiple()  4:  41
analogRead() time / analogReadMultiple() time   1

mcp28.analogRead()  5:  98
mcp28.analogReadMultiple()  5:  49
analogRead() time / analogReadMultiple() time   2

mcp28.analogRead()  6:  118
mcp28.analogReadMultiple()  6:  55
analogRead() time / analogReadMultiple() time   2

mcp28.analogRead()  7:  137
mcp28.analogReadMultiple()  7:  62
analogRead() time / analogReadMultiple() time   2

mcp28.analogRead()  8:  157
mcp28.analogReadMultiple()  8:  70
analogRead() time / analogReadMultiple() time   2

Based on this we can see that's not actually the reading time that's slow, but the SPI start/end communication.

RobTillaart commented 1 year ago

mcp28.analogRead()  7:  137
mcp28.analogReadMultiple()  7:  62
analogRead() time / analogReadMultiple() time   2

mcp28.analogRead()  8:  157
mcp28.analogReadMultiple()  8:  70
analogRead() time / analogReadMultiple() time   2

(note: the figures above are platform dependent, still they are very informative) The delta for an extra analogRead is about 20 us The delta for an extra channel in analogReadMulti() is about 8 us ==> so every transaction initialization takes about 12 us

RobTillaart commented 1 year ago

Created a table of the comparison
to be saved as readADC_vs_readADCMultiple.md


### Comparing multiple **readADC()** calls with one **readADCMultiple()** call

- ESP32
- source https://github.com/RobTillaart/MCP_ADC/pull/11#issuecomment-1676461195
- figures are indicative and platform dependent.

|  function            |   1   |   2   |   3   |   4   |   5   |   6   |   7   |   8   |  <- channels 
|:--------------------:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|
|  analogRead          |       |   42  |   59  |   78  |   98  |  118  |  137  |  157  |  (us)
|  analogReadMultiple  |       |   27  |   34  |   41  |   49  |   55  |   62  |   70  |  (us)
|  ratio               |       |  1.56 |  1.73 |  1.90 |  2.00 |  2.15 |  2.21 |  2.24 |

TODO AVR UNO

RobTillaart commented 1 year ago

Think the code looks good to merge into master.

The remaining remarks are minor and I'll need to check if there is other "low hanging fruit" I can include in the release.

alx-uta commented 1 year ago

Think the code looks good to merge into master.

The remaining remarks are minor and I'll need to check if there is other "low hanging fruit" I can include in the release.

Thanks.

I'll try to go thru the comments and do any changes.

RobTillaart commented 1 year ago

Raw output performance test on AVR / UNO

MCP3208_performance\MCP3208_performance.ino
MCP_ADC_LIB_VERSION: 0.1.9

ADC CHAN    MAXVALUE
mcp28   8   4095

Timing in micros().

***************************************

1000000
mcp28.analogRead()  8z:     512
mcp28.analogReadMultiple()  8x:     368
analogRead() time / analogReadMultiple() time   1
mcp28.differentialRead() 8x:    468
mcp28.deltaRead()   8x:     928

2000000
mcp28.analogRead()  8z:     392
mcp28.analogReadMultiple()  8x:     272
analogRead() time / analogReadMultiple() time   1
mcp28.differentialRead() 8x:    368
mcp28.deltaRead()   8x:     740

3000000
mcp28.analogRead()  8z:     408
mcp28.analogReadMultiple()  8x:     272
analogRead() time / analogReadMultiple() time   1
mcp28.differentialRead() 8x:    368
mcp28.deltaRead()   8x:     740

4000000
mcp28.analogRead()  8z:     352
mcp28.analogReadMultiple()  8x:     224
analogRead() time / analogReadMultiple() time   1
mcp28.differentialRead() 8x:    320
mcp28.deltaRead()   8x:     644

5000000
mcp28.analogRead()  8z:     356
mcp28.analogReadMultiple()  8x:     224
analogRead() time / analogReadMultiple() time   1
mcp28.differentialRead() 8x:    320
mcp28.deltaRead()   8x:     644

6000000
mcp28.analogRead()  8z:     356
mcp28.analogReadMultiple()  8x:     224
analogRead() time / analogReadMultiple() time   1
mcp28.differentialRead() 8x:    324
mcp28.deltaRead()   8x:     644

7000000
mcp28.analogRead()  8z:     356
mcp28.analogReadMultiple()  8x:     224
analogRead() time / analogReadMultiple() time   1
mcp28.differentialRead() 8x:    324
mcp28.deltaRead()   8x:     644

8000000
mcp28.analogRead()  8z:     336
mcp28.analogReadMultiple()  8x:     200
analogRead() time / analogReadMultiple() time   1
mcp28.differentialRead() 8x:    296
mcp28.deltaRead()   8x:     596

9000000
mcp28.analogRead()  8z:     332
mcp28.analogReadMultiple()  8x:     200
analogRead() time / analogReadMultiple() time   1
mcp28.differentialRead() 8x:    300
mcp28.deltaRead()   8x:     596

10000000
mcp28.analogRead()  8z:     328
mcp28.analogReadMultiple()  8x:     200
analogRead() time / analogReadMultiple() time   1
mcp28.differentialRead() 8x:    304
mcp28.deltaRead()   8x:     588

11000000
mcp28.analogRead()  8z:     328
mcp28.analogReadMultiple()  8x:     208
analogRead() time / analogReadMultiple() time   1
mcp28.differentialRead() 8x:    304
mcp28.deltaRead()   8x:     592

12000000
mcp28.analogRead()  8z:     328
mcp28.analogReadMultiple()  8x:     208
analogRead() time / analogReadMultiple() time   1
mcp28.differentialRead() 8x:    304
mcp28.deltaRead()   8x:     592

13000000
mcp28.analogRead()  8z:     328
mcp28.analogReadMultiple()  8x:     208
analogRead() time / analogReadMultiple() time   1
mcp28.differentialRead() 8x:    296
mcp28.deltaRead()   8x:     588

14000000
mcp28.analogRead()  8z:     328
mcp28.analogReadMultiple()  8x:     200
analogRead() time / analogReadMultiple() time   1
mcp28.differentialRead() 8x:    296
mcp28.deltaRead()   8x:     588

15000000
mcp28.analogRead()  8z:     324
mcp28.analogReadMultiple()  8x:     200
analogRead() time / analogReadMultiple() time   1
mcp28.differentialRead() 8x:    300
mcp28.deltaRead()   8x:     596

16000000
mcp28.analogRead()  8z:     328
mcp28.analogReadMultiple()  8x:     200
analogRead() time / analogReadMultiple() time   1
mcp28.differentialRead() 8x:    300
mcp28.deltaRead()   8x:     596

***************************************

8000000
mcp28.analogRead()  1:  40
mcp28.analogReadMultiple()  1:  32
analogRead() time / analogReadMultiple() time   1.25

mcp28.analogRead()  2:  76
mcp28.analogReadMultiple()  2:  60
analogRead() time / analogReadMultiple() time   1.27

mcp28.analogRead()  3:  112
mcp28.analogReadMultiple()  3:  88
analogRead() time / analogReadMultiple() time   1.27

mcp28.analogRead()  4:  148
mcp28.analogReadMultiple()  4:  104
analogRead() time / analogReadMultiple() time   1.42

mcp28.analogRead()  5:  184
mcp28.analogReadMultiple()  5:  128
analogRead() time / analogReadMultiple() time   1.44

mcp28.analogRead()  6:  220
mcp28.analogReadMultiple()  6:  152
analogRead() time / analogReadMultiple() time   1.45

mcp28.analogRead()  7:  264
mcp28.analogReadMultiple()  7:  176
analogRead() time / analogReadMultiple() time   1.50

mcp28.analogRead()  8:  292
mcp28.analogReadMultiple()  8:  200
analogRead() time / analogReadMultiple() time   1.46

RobTillaart commented 1 year ago

@alx-uta Created a develop branch as preparation for the new release 0.2.0 . Besides the analogReadMultiple() it includes maintenance updates, documentation etc.

RobTillaart commented 1 year ago

@alx-uta

Apparently I did not think and test well enough, the single flag for differentialRead got lost. I'm going to fix this in the upcoming release by reverting the analogADC() code. The multiple code will stay of course.

int16_t MCP_ADC::readADC(uint8_t channel, bool single)
{
  if (channel >= _channels) return 0;

  int16_t reading[1];
  MCP_ADC::readADCMultiple(&channel, 1, reading);

  return reading[0];
}

alx-uta commented 1 year ago

Raw output performance test on AVR / UNO

The results are close, compared to the esp32 where we can see the performance gain.

Created a develop branch as preparation for the new release 0.2.0 . Besides the analogReadMultiple() it includes maintenance updates, documentation etc.

That sounds good, I guess you'll also need to update the examples.

Apparently I did not think and test well enough, the single flag for differentialRead got lost. I'm going to fix this in the upcoming release by reverting the analogADC() code. The multiple code will stay of course.

Ah, sorry about that, I missed it when I moved the code.

RobTillaart commented 1 year ago

0.2.0 released

alx-uta commented 1 year ago

0.2.0 released

Thanks again for this great library.

RobTillaart / MCP_ADC

Read multiple channels using the same SPI transaction. #11