DavidArmstrong / SCL3300

Arduino library for interfacing with the Murata SCL3300 Inclinometer via SPI
MIT License
20 stars 11 forks source link

Library speed #26

Closed amdkam closed 2 years ago

amdkam commented 3 years ago

David, thanks for all the work so far. Very helpful. One issue I ran into is the fact that the SCL3300 datasheet specifies an ODR of 2000Hz and suggests we read out the registers at the same rate to meet the sensor's noise performance. However I have never been able to get anywhere over 1130 readings in 1 second even with a modified version of several functions. First issue I found is with the scl3300::transfer() function where I found 2 occurrences of a 1ms delay ( L396 and L407). This will already get us a maximum of 500Hz. I suggest changing all the millis() to micros(), since the only requirement from the datasheet is a 10µs delay between consecutive reads. Still I have not been able to reach the 2000Hz on a custom board running nrf52832, a Particle Boron (nrf52840) and an MKRZERO (SAMD21).

Also the startup sequence doesn't match the one suggest on p. 21 but I haven't found it to mess with performance so far.

datasheet: link

DavidArmstrong commented 3 years ago

Hi,

The startup sequence is based off of documentation that I found a couple of years ago. Unless I see something obviously broke or problematic, I've more a tendency to leave it alone. (The library has been quite stable for several months now, which is a good sign.) Generally, going a little slower hasn't been an issue. I'm curious though as to your use case. I have no problem working to speed things up here, and I am familiar with that 10 uSec minimum timing requirement as well. I can run some tests, to see if shortening the delays helps. (There are several places they are used, besides just the transfer() function.)

DavidArmstrong commented 3 years ago

I noticed that the latest datasheet is Revision 2. I probably wrote my initial code based on Revision 1. Hence the differences. I'll review the current Rev 2 doc for the startup sequence.

DavidArmstrong commented 3 years ago

Reading up on the micros() function starts to bring back all sorts of red flags in my head with using it. First, its behavior is CPU dependent. That means it behaves differently on different board types. Ouch. I can't guarantee that using it will always cause the library to work as expected.
Second, its counter overflows in as little as 71.6 minutes. That means that there is the distinct possibility that if used long enough, a condition can be seen where the counter wraps back to zero while it is being used to time a delay. So that takes care to make sure that possibility is accounted for when it happens. And third, on many board types the resolution of micros() is limited to 4 uSec, not the 1 uSec that one would expect. That will again introduce more delay than one is expecting, which limits the maximum rate that the SCL3300 could be read. I can avoid all of this by just staying with the 1 millisecond delay timing using millis() in transfer().
Now, I can remove the second delay though. And I can update the startup sequence used. I'll test these changes, and see if there are any adverse effects. Then I'll look into using micros().

DavidArmstrong commented 3 years ago

So far, my testing is looking promising. I'll set this up to run over several days, and see if it stays that way.

JLH-94 commented 3 years ago

Reading up on the micros() function starts to bring back all sorts of red flags in my head with using it. First, its behavior is CPU dependent. That means it behaves differently on different board types. Ouch. I can't guarantee that using it will always cause the library to work as expected. Second, its counter overflows in as little as 71.6 minutes. That means that there is the distinct possibility that if used long enough, a condition can be seen where the counter wraps back to zero while it is being used to time a delay. So that takes care to make sure that possibility is accounted for when it happens. And third, on many board types the resolution of micros() is limited to 4 uSec, not the 1 uSec that one would expect. That will again introduce more delay than one is expecting, which limits the maximum rate that the SCL3300 could be read. I can avoid all of this by just staying with the 1 millisecond delay timing using millis() in transfer(). Now, I can remove the second delay though. And I can update the startup sequence used. I'll test these changes, and see if there are any adverse effects. Then I'll look into using micros().

Hello, just to add a little on that score. I already used your library some time ago. In the last few project moves, I have also modified it at exactly this point. I also used micros(). To prevent the overflow: Since micros(), just like millis, fortunately returns uint32 to ensure this, you can of course also force the calculation in (uint32_t).

startmicros = micros(); while (micros() - startmicros < 150) ; //only used 150 to have a bit of a buffer here, was fast enough for me

If there is an overflow we get worst case: startmicros= 4294967295 and micros()=1 So we get 1- 4294967295 (in uint_32) = 2 <150 (next 3<150 and so on working just fine)

Of course correct me anytime if i'am wrong. Have a nice day.

DavidArmstrong commented 3 years ago

My speed test being done with the code changes in place is showing a consistent rate of 1000 iterations per second. This could be optimized further by cutting down on the number of register reads that are done. I may try that later, but I'll start by letting this test run for today.

amdkam commented 3 years ago

My speed test being done with the code changes in place is showing a consistent rate of 1000 iterations per second. This could be optimized further by cutting down on the number of register reads that are done. I may try that later, but I'll start by letting this test run for today.

Thanks both, I'm doing some additional software filtering, but I still need to pick up variations in inclinations of 0.02° at a frequency of about 5Hz. So the more samples I get, the better. This is when I noticed the sentence in the datasheet that suggested reading at the sensor ODR (2000Hz) otherwise the noise performance of the sensor is not met. It would have helped if Murata would have implemented a Data Ready register of some sorts. Now the timing is up to us, though it may not be that critical.

What caused your red flags w.r.t. the use of micros()? As far as I know it's the same as millis() but with a different prescale, though I haven't looked into it too much.

Anyway, curious for the results of your test.

DavidArmstrong commented 3 years ago

I'm finding that even with cutting back on the number of reads done at available(), I'm still getting just 1000 iterations per second. I'm beginning to think that there may be a limitation based on the Arduino environment implementation itself. (It has it's own stuff running in the background that isn't obvious by looking at a sketch.) A higher rate may only be possible by going directly to putting an assembly code to the target board. (Which is what we are trying to avoid by using the Arduino IDE to begin with.)

amdkam commented 3 years ago

If you take the below code, with all while loops removed I get 3925Hz. The result isn't meaningful, and there's no error handling, but just for demonstration purposes. I scoped it, and the rest of the code is slow enough to ensure the 10µs delay... Anyway, a while loop that polls .availabe() makes it very slow with each available entailing a beginTransmission(), 11x a transfer() and an endTransmission(). Filling a buffer could be a better approach if one were to need speed.

uint32_t SCL3300::readloop(uint32_t readms) {
  uint32_t counter = 0;
  //Version 3 of this function
  //boolean errorflag = false;
  //Read all Sensor Data, as per Datasheet requirements
  beginTransmission(); //Set up this SPI port/bus
  transfer(SwtchBnk0);
  uint32_t startms = millis();
  {
    while(millis() - startms < readms)
      {

        transfer(RdAccX);
        transfer(RdAccY);
        sclData.AccX = SCL3300_DATA;
        transfer(RdAccZ);
        sclData.AccY = SCL3300_DATA;
        transfer(RdSTO);
        sclData.AccZ = SCL3300_DATA;
        transfer(RdTemp);
        sclData.STO = SCL3300_DATA;
        transfer(RdAngX);
        sclData.TEMP = SCL3300_DATA;
        transfer(RdAngY);
        sclData.AngX = SCL3300_DATA;
        transfer(RdAngZ);
        sclData.AngY = SCL3300_DATA;
        transfer(RdStatSum);
        sclData.AngZ = SCL3300_DATA;
        transfer(RdWHOAMI);
        sclData.StatusSum = SCL3300_DATA;
        transfer(RdWHOAMI);
        sclData.WHOAMI = SCL3300_DATA;
        counter++;
      }
  }

  endTransmission(); //Let go of SPI port/bus
  //if (errorflag) return false; //Inform caller that something went wrong
  // The WHOAMI command should give an 8 bit value of 0xc1
  return counter; //Let the caller know this worked
}

And then the transfer function looks like this:

// Routine to transfer a 32-bit integer to the SCL3300, and return the 32-bit data read
unsigned long SCL3300::transfer(unsigned long value) {
  FourByte dataorig;
  dataorig.bit32 = value; //Allow 32 bit value to be sent 8 bits at a time
  #ifdef debug_scl3300
  Serial_SCL.print(dataorig.bit32, HEX);
  Serial_SCL.print(" ");
  for (int j = 3; j >= 0; j--) {
    Serial_SCL.print(dataorig.bit8[j], HEX);
    Serial_SCL.print(" ");
  }
  #endif
  //Must allow at least 10 uSec between SPI transfers
  //The datasheet shows the CS line must be high during this time
  //We lengthen it some for environment variabilities
  //unsigned long startmicros = micros();
  //while ((micros() - startmicros) < 15) ;

  digitalWrite(scl3300_csPin, LOW); //Now chip select can be enabled for the full 32 bit xfer
  SCL3300_DATA = 0;
  for (int i = 3; i >= 0; i--) { //Xfers are done MSB first
    dataorig.bit8[i] = SPI.transfer(dataorig.bit8[i]);
  }
  SCL3300_DATA = dataorig.bit8[1] + (dataorig.bit8[2] << 8);
  SCL3300_CRC = dataorig.bit8[0];
  SCL3300_CMD = dataorig.bit8[3];
  //startmicros = micros();
  //while (micros() - startmicros < 10) ;
  digitalWrite(scl3300_csPin, HIGH); //And we are done
  #ifdef debug_scl3300
  for (int i = 3; i >= 0; i--) {
    Serial_SCL.print(" ");
    Serial_SCL.print(dataorig.bit8[i], HEX);
  }
  Serial_SCL.print("  ");
  #endif
  if (SCL3300_CRC == CalculateCRC(dataorig.bit32))
    crcerr = false;
  else
    crcerr = true;
  //check RS bits
  if ((SCL3300_CMD & 0x03) == 0x01)
    statuserr = false;
  else
    statuserr = true;
    #ifdef debug_scl3300
    Serial_SCL.print((SCL3300_CMD & 0x03));
    Serial_SCL.print(" ");
    Serial_SCL.print(SCL3300_DATA, HEX);
    Serial_SCL.print(" ");
    Serial_SCL.print(SCL3300_CRC, HEX);
    Serial_SCL.print(" ");
    Serial_SCL.print(CalculateCRC(dataorig.bit32), HEX);
    Serial_SCL.print(" ");
    Serial_SCL.println(crcerr);
  #endif
  return dataorig.bit32;
}
DavidArmstrong commented 3 years ago

I suspect that a majority of the slowness comes from the need to use beginTransmission() and endTransmission() to bracket the SPI register reads. This is a safety 'thing', to make sure that the SPI implementation doesn't interfere with whatever else may be going on in the rest of a sketch. The library is written with the priority of keeping it simple to use, and hiding complexity from the user. I don't think that removing the one micros() call in transfer() would have much impact on the speed, but I'd have to test it to prove that. If one is willing to go through the work of controlling the SPI communication directly, as you have, then it certainly is possible to design a sketch to get much faster than what can be done now with the library. But that also requires an increase in complexity, and opens up the possibility of increased difficulty when debugging. So, it's a tradeoff. The Arduino philosophy is to keep it as simple, usable, and 'bullet-proof' as possible. (And that last one is difficult with SPI, because getting reliable hardware connections can be quite a headache at times. Hence the needed error checking.) I'll have to think about adding an option for faster speeds, based on the optimizations we've discussed. But the core design with available() will stay as it is for that function. Most users will be fine with the standard implementation. For those use cases that require more speed, and the user is willing to handle the added complexity, that could be made an option.

DavidArmstrong commented 3 years ago

Cutting out beginTransmission() and endTransmission() to bracket the SPI register reads, and removing that last wait loop puts the number of iterations per second at 2367. This is more than double from the 1000 iterations per second in the standard mode. So it keeps the SPI communication open continuously, and that does have a huge effect on the speed. Of course, depending on what else may be going on in a sketch, this may or may not have adverse side effects.

DavidArmstrong commented 3 years ago

I've just released version 3.1.0 of the library, adding the SetFastReadMode() function to allow increasing the sensor read rate as we have discussed here. Hopefully this change will be able to meet your data needs.

amdkam commented 3 years ago

Thanks for the update. I'll check it at a later time to see how it performs on my setup, but it looks promising. Am I correct that the only way to end the SPI transmission now is to set the sensor in power down mode? Perhaps a startFastMode() and endFastMode() could be beneficial, but you be the judge on that.

DavidArmstrong commented 3 years ago

Thanks for the suggestions. I'll keep them in mind for the next update, when that gets done. Meanwhile, I look forward to hearing back regarding the results of your testing with the new FastMode.

DavidArmstrong commented 3 years ago

I added a branch to add in a new function as per your request - stopFastReadMode(). If you have time, check it out. The Fsat Read Mode can be used either continuously, or in bursts, as shown in the last two examples in that code branch.

DavidArmstrong commented 2 years ago

Closing issue after being open for six months with no further comments or changes.