Closed eringerli closed 2 years ago
I reorganised my repo, so this pull-request ist a copy of #87
ok lets split this again into a PR for begin/end transaction changes and then the buffering changes!
I submitted a seperate pull request for the transaction management (#90). I will rebase this one if you decide to accept #90.
A huge gain in performace is already in #90: instead of reading/writing each byte individually in a loop, do it in one go. This PR adds the combination of reading/writing to it, which removes the additional round trip of transation management in between.
ok can ya keep this PR to just the transaction+CS assert/desassert changes? i am going very carefully :)
see #90 :wink:
that PR still has both CS / begin transaction changes mixed with the transfer-buffering changes. it'd be easier for us to do one at a time
This is because the history looks like this:
PR #90 is for the branch streamlined-transaction-management
, PR #88 is for the branch read-write-combine
, which is based on streamlined-transaction-management
.
It is too much work for me too back port the combining stuff without the CS/transaction stuff, as this just makes no sense. I suggest first merging #90, after that #88.
The buffering stuff looks like this:
size_t lenBuffer = prefix_len + len;
if (lenBuffer < minBufferSizeToMalloc) {
uint8_t tmpBuffer[lenBuffer];
memcpy(tmpBuffer, prefix_buffer, len);
memcpy(tmpBuffer + prefix_len, buffer, len);
write_and_read(tmpBuffer, lenBuffer);
} else {
auto tmpBuffer = new uint8_t[lenBuffer];
memcpy(tmpBuffer, prefix_buffer, len);
memcpy(tmpBuffer + prefix_len, buffer, len);
write_and_read(tmpBuffer, lenBuffer);
delete buffer;
}
The CS/transaction like this:
beginTransactionWithAssertingCS();
transfer(buffer, len);
endTransactionWithDeassertingCS();
So no, #90 doesn't contain any buffering changes or changes which impact the memory footprint of Adafruit_BusIO.
@ladyada Did you have the chance to look into this? Is there still something which blocks this PR?
id like the two conceptual changes: buffering the tx's / changing the CS pin and beginTransaction, to be seperated into two PRs so they can be reviewed and merged sepearately in case we need to revert changes!
As I said before, it already is. This PR just has the commit for transaction/CS management (3c18659) too, as it is based on it. I cannot split it further. If you merge the other one, I can rebase this PR on master again, but not before. The buffering changes are in 0caf969.
not sure whats up with the two PRs but we see changes to spi data/transactions in https://github.com/adafruit/Adafruit_BusIO/pull/90/files
maybe check the branches?
(and data changes in this one too)
Yes, #90 is about transactions, #88 is about data changes. The numbers are bit confusing, as #88 is based on #90, but that was just lazy me that forcepushed and rewrote a PR. I split the transaction stuff out and made a new PR, which got number 90
right, please start with a pr that is just changing CS and begintransaction in all locations and we'll go from there
Hehe, that's kind of the point of refacturing, isn't it? 😉
correct, and refactoring is super fun and often introduces bugs because so much has changed!
Should I split the other PR further into logical commits? Would that be a sensible compromise?
Code
If the code doesn't run on an 8bit-AVR, the data is prepared by combining the reads and writes into one buffer, which is then transfered as one transaction on the bus. A constant decides if the buffer is allocated on the stack or from the heap, further reducing the overhead for typical small reads and writes without limiting the transactions in size. This is a tradeof between RAM and CPU: on the 'bigger' arduino platforms the CPU is a lot faster than the SPI, so using more RAM/CPU cycles and then letting the DMA do its work is the way to go.
I tested it on an ESP32 on a 10Mhz bus: The overhead for an transaction was about 0.8us/1.0us (begin/end), by combining reads and writes into a newly allocated buffer it rises to 1.0us/1.0us (begin/end). But as most of the typical transactions are combined like getting the register contents from a chip (1 byte to send, multiple to read), combining cuts out 1.8us but adds 0.2us, which results in a net gain of 1.6us per transaction. All in all this results in a gain in typical usecases of around 15%.
Logic analyser trace
These are two traces of the data read out of a LIS3MDL over SPI with 10Mhz data rate. The difference is clear: with these changes a transaction takes only 85% of the time as without.
Before
After