d-ronin / openlager

STM32F4 based logging dongle for HIGH RATE logging
96 stars 25 forks source link

Does 4 bit mode really work ? It is turned off in openlager.c's sd_init() call #56

Closed mirov closed 2 years ago

mirov commented 2 years ago

I found that my openlager running an image built from this Github source has a "false" parameter to the sd_init() call in openlager.c, and that means "don't use fourbit mode". Is it turned off for a reason ? I flipped it to true and can log for a while, but then it hangs. Is that because the HW reference design only has a 47K pullup on SDIO_D0 ? The other pins seem to have the pullups in the STM32 GPIOs turned on, but perhaps they are too weak for 19.2MHz.

mlyle commented 2 years ago

It's turned off because SD bandwidth was not limiting performance on any of the cards we used. The limiting factor wasn't the bus interface but the amount of time the card spent busy after I/O requests. Bigger buffers / STM32F412, allowing bigger I/O operations per flush, would make a bigger difference than bus width and speed.

Of course, there's been a couple of years in progress in cards, so maybe there's some cards that would benefit from 4X now. It was never really tested. The hardware should be capable, but there could easily be some DMA etc subtleties that get triggered at 4x the datarate-- or just increased need for error/unusual state recovery.

Pullup on SD bus lines are required during initialization but are not used during signalling-- the card enters a push-pull state. A 47k pullup would not be fast enough for open collector at 1MHz, let alone 19.2MHz. So that is not likely to be your issue.

mirov commented 2 years ago

Hi Michael, I'll bet you were surprised to see activity on this after a few years of quiet. Hmmm, I wasn't sure if the 47K was simply there to hold the high state.

I've got the Lager in a power sensitive design and am trying to find knobs to drop consumption. I'm sending data in bursts at 1Mbps, but at an average of about 64KB/sec. I found that things seem to be stable with the core clock rate dropped down to 48MHz, and the SD clock down to 9.6 (that saves around 10mA). I'm looking at dropping my supply rail down to 2.7V, which I believe will be OK with the newer 32GB cards I'm using - and that will drop my upstream battery current by about 10% more. I found that preallocating a large enough block drops the power as well.

The once seemingly huge 128KB buffer of the 'F11 only gives me a few seconds of buffer, but if it isn't keeping up my application runs so long that a bigger buffer isn't going to help.

Oddly, every now and then I can get the device to log at a really low (like half) power consumption. Are there any magic block sizes/timing preferences that I can exploit to keep it in a more efficient mode ? I can with some effort change the size of my serial data blocks, batch them up, etc. At the moment I'm logging in ASCII 'cause it is so easy to debug and parse. I can cut my byte count down by 2-3x if I switch to binary, but it isn't clear if that will help the STM/uSD save power (busywaits instead of sleeps for example).

-Russ

On Sat, Oct 2, 2021 at 3:09 PM Michael Lyle @.***> wrote:

It's turned off because SD bandwidth was not limiting performance on any of the cards we used. The limiting factor wasn't the bus interface but the amount of time the card spent busy after I/O requests. Bigger buffers / STM32F412, allowing bigger I/O operations per flush, would make a bigger difference than bus speed.

Of course, there's been a couple of years in progress in cards, so maybe there's some cards that would benefit from 4X now. It was never really tested.

Pullup on SD bus lines are required during initialization but are not used during signalling-- the card enters a push-pull state. A 47k pullup would not be fast enough for open collector at 1MHz, let alone 19.2MHz. So that is not likely to be your issue.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/d-ronin/openlager/issues/56#issuecomment-932826858, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJFJKSFC7LIF45N7NISDFLUE57JJANCNFSM5FGAOWKA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

mlyle commented 2 years ago

Interesting. Downclocking wouldn't be the first thing I'd do-- I'd try to race-to-sleep instead.

Looks like right now usart_receive_chunk busywaits, so getting a wait-for-interrupt in there would likely be helpful.

Currently a bunch of peripheral clocks are on that aren't needed. But then a lot of them can be disabled-- e.g. when SD is not doing a transaction, the peripheral can be paused. It would be better to keep the clock rate high and then disable the clock as quickly as possible after a transaction finishes.

mirov commented 2 years ago

Oh, I'd also much prefer full speed clocks (at least those that I need) and sleeping the CPU until something wakes it up. Add in some DMA to keep it sleeping until things are really done and the power should be pretty low. Alas, I have never worked with the STM series (I'm mostly a nRF52 guy).

With the core clock slowed down I get lower power, but the tradeoff is that I also get periodic hangs with the LED stuck on.

-Russ

On Sat, Oct 2, 2021 at 6:49 PM Michael Lyle @.***> wrote:

Interesting. Downclocking wouldn't be the first thing I'd do-- I'd try to race-to-sleep instead.

Looks like right now usart_receive_chunk busywaits, so getting a wait-for-interrupt in there would likely be helpful.

Currently a bunch of peripheral clocks are on that aren't needed. But then a lot of them can be disabled-- e.g. when SD is not doing a transaction, the peripheral can be paused. It would be better to keep the clock rate high and then disable the clock as quickly as possible after a transaction finishes.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/d-ronin/openlager/issues/56#issuecomment-932847175, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJFJKWP32S2JLZJU6OYBX3UE6ZDPANCNFSM5FGAOWKA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

mlyle commented 2 years ago

Try calling __WFI in the loop in usart_receive_chunk.

mlyle commented 2 years ago

(note SD uses DMA; though we don't sleep during SD transactions either. But making USART use DMA is much harder, because we're trying to adaptively adjust our flush behavior based on both what the card is doing and the arrival rate of information).

mirov commented 2 years ago

That saved about 30% of what it was consuming, that was a nice suggestion (and I would bet that it doesn't introduce any odd instabilities). I assume we're waking up for any INT. Are there reliable sources always running to be sure we don't snooze forever in that loop ? I don't know if for example there's a timer running that periodically pops and will always nudge us to check the loop termination condition. If so, then are there other spots we're spinning for significant amounts of time that we could play the same trick ? For example in shared/sdio.c there's a loop waiting for SDIO_STA_DATAEND that from your earlier comments sounds like we spend some time in. I dropped a WFI in there and it seems like it save perhaps 1/2 mA (which I'll take, thank you). But again, I don't know if that introduces a possible deadlock...

I really appreciate the help !

-Russ

On Sat, Oct 2, 2021 at 9:35 PM Michael Lyle @.***> wrote:

Try calling __WFI in the loop in usart_receive_chunk.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/d-ronin/openlager/issues/56#issuecomment-932861366, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJFJKR2OZWOXB4LFAAFUTDUE7MPRANCNFSM5FGAOWKA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

mlyle commented 2 years ago

The two things that allow leaving the USART loop are either a UART interrupt or a timeout expiring (and the time is updated by the systick "interrupt"). So there's no cost to performance there.

The same isn't quite true of the SDIO stuff, which isn't interrupt driven presently. Systick will still show up and knock you out of that eventually, but there may be times that the sd bus is left unnecessarily idle.

The next thing I'd try to reduce consumption is to turn off the SD peripheral's clock at the end of sd_write/sd_read, and to turn it back on at the beginning of sd_write/sd_read.

RCC_APB2PeriphClockCmd(RCC_APB2Periph_SDIO, ENABLE (or DISABLE);

main() turns on a bunch of peripheral clocks, and I think a lot of them are unused. TIM1/TIM2/TIM3/TIM4/TIM5, and maybe SPI1. OpenLager was mostly used on drones consuming hundreds of amperes, so there was absolutely no pressure to save a few milliamperes (and turning on all the clocks you might need saves a lot of confusion later).

P.S. --- I think you may have worked with my oldest brother Jim at Berkeley, Tandem, or Sun? Looks like we're connected on LinkedIn.

mirov commented 2 years ago

Jim Lyle? Yeah that was only 30-40 years ago though! I didn't realize that the internet was this small... I'm still an active HW guy, is he ? Are you in the Bay Area also ? We seem to be in the same time zone (Los Altos for me).

I found OpenLager through my drone interests (actually fixed wing mostly) and was looking for something that had better buffering than OpenLog. My application doesn't need raw performance, but I really don't want to drop any samples. It does however care a lot about power consumption.

Ouch! Toggling the SDIO clock on/off actually increased the power by 10mA . Does that ClkCmd have a long settling time built in for a PLL to lock ?

-Russ

On Sun, Oct 3, 2021 at 10:14 AM Michael Lyle @.***> wrote:

The two things that allow leaving the USART loop are either a UART interrupt or a timeout expiring (and the time is updated by the systick "interrupt"). So there's no cost to performance there.

The same isn't quite true of the SDIO stuff, which isn't interrupt driven presently. Systick will still show up and knock you out of that eventually, but there may be times that the sd bus is left unnecessarily idle.

The next thing I'd try to reduce consumption is to turn off the SD peripheral's clock at the end of sd_write/sd_read, and to turn it back on at the beginning of sd_write/sd_read.

RCC_APB2PeriphClockCmd(RCC_APB2Periph_SDIO, ENABLE (or DISABLE);

main() turns on a bunch of peripheral clocks, and I think a lot of them are unused. TIM1/TIM2/TIM3/TIM4/TIM5, and maybe SPI1. OpenLager was mostly used on drones consuming hundreds of amperes, so there was absolutely no pressure to save a few milliamperes (and turning on all the clocks you might need saves a lot of confusion later).

P.S. --- I think you may have worked with my oldest brother Jim at Berkeley, Tandem, or Sun? Looks like we're connected on LinkedIn.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/d-ronin/openlager/issues/56#issuecomment-932990905, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJFJKUACX5HLZQPYGVXPL3UFCFPFANCNFSM5FGAOWKA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

mlyle commented 2 years ago

Re: PLL-- that's a good question. No-- the "48MHz" SDIO clock is derived from the main PLL which needs to be locked to clock the main processor. It should just be gating that on and off-- there should be no reason for power consumption to go up. Quite odd. Maybe it checks in with the card when we re-enable its clock and we have to wait for that.

image

It's not trivial to sleep while SD does operations in a correct way-- but it sounds like the benefit from that is small.

Our small memory size is really the worst thing here: the controller on the SD cards are optimized for lots of contiguous data being written. We try to write big chunks, but aren't always able to-- which can result in a lot of write amplification and the SD card doing a whole lot of work to shuffle blocks around and erase--- sometimes being busy for a couple hundred milliseconds during which time we need to hold onto our buffers, etc.

Another thing that could help is increasing the USART timeout so we can do larger transactions to the card. If you send e.g. 20 bytes to OL, it will store it in its buffer, but eventually decide to flush it-- increasing that timeout in main over 200ms might help, but risks data loss of whatever that interval is if power is lost suddenly.

            // 50 ticks == 200ms, prefer 512 byte sector alignment,
    // and >= 2560 byte chunks are best
    // Never get more than about 2/5 of the buffer (40 * 1024)--
    // because we want to finish the IO and free it up
    pos = usart_receive_chunk(50, 512, 5*512,
            40*1024, &amt);

Yup--- the Valley/internet is still small. Both Jim and I are down in Morgan Hill. He retired from Intel's capsense group a year or two ago; I got into both hardware design and remote controlled aircraft thanks to his influences :D. I'm a bit younger than Jim-- was still a baby when he went off to college-- but I've mostly exited tech myself. I'm a middle school teacher now.

mirov commented 2 years ago

I used to fly an Electra in the empty field across the street from Tandem during lunch. And at about that time I designed a UHF video transmitter to hook a vidicon-based B/W Sony camera to it so I could do FPV back in the late 80's or early 90's. It worked, but I had to use a portable TV and a hood to see things. I still have the Electra, and was looking at it over the Summer realizing that it can carry a lot of 18650 cells...

Bumping that timeout up by a lot (from 50 to 200) doesn't seem to do anything. I think that kicks in only really as an exception ? I'm logging data in a system that spits out a ~128 character line every 2mSec (though normally fewer than 128). I could pad it out so that it was always an even 128 bytes if that would help keep things aligned.

I have now and then seen the current spike up ~30mA for a second or two, I've assumed that the card is doing something internally. Do the new really fast cards (like 100+MB/s) just have a bunch of RAM inside to buffer through that sort of thing ? I've ordered a few types to try out...

-Russ

On Sun, Oct 3, 2021 at 12:39 PM Michael Lyle @.***> wrote:

Re: PLL-- that's a good question. No-- the "48MHz" SDIO clock is derived from the main PLL which needs to be locked to clock the main processor. It should just be gating that on and off-- there should be no reason for power consumption to go up. Quite odd. Maybe it checks in with the card when we re-enable its clock and we have to wait for that.

[image: image] https://user-images.githubusercontent.com/90903/135768569-a961cc81-30e2-4c05-ab1e-b859c750c3ca.png

It's not trivial to sleep while SD does operations in a correct way-- but it sounds like the benefit from that is small.

Our small memory size is really the worst thing here: the controller on the SD cards are optimized for lots of contiguous data being written. We try to write big chunks, but aren't always able to-- which can result in a lot of write amplification and the SD card doing a whole lot of work to shuffle blocks around and erase--- sometimes being busy for a couple hundred milliseconds during which time we need to hold onto our buffers, etc.

Another thing that could help is increasing the USART timeout so we can do larger transactions to the card. If you send e.g. 20 bytes to OL, it will store it in its buffer, but eventually decide to flush it-- increasing that timeout in main over 200ms might help, but risks data loss of whatever that interval is if power is lost suddenly.

        // 50 ticks == 200ms, prefer 512 byte sector alignment,

// and >= 2560 byte chunks are best // Never get more than about 2/5 of the buffer (40 1024)-- // because we want to finish the IO and free it up pos = usart_receive_chunk(50, 512, 5512, 40*1024, &amt);

Yup--- the Valley/internet is still small. Both Jim and I are down in Morgan Hill. He retired from Intel's capsense group a year or two ago; I got into both hardware design and remote controlled aircraft thanks to his influences :D. I'm a bit younger than Jim-- was still a baby when he went off to college-- but I've mostly exited tech myself. I'm a middle school teacher now.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/d-ronin/openlager/issues/56#issuecomment-933012984, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJFJKQDS5IK7RNDUSEFA3TUFCWQPANCNFSM5FGAOWKA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

mlyle commented 2 years ago

Bumping that timeout up by a lot (from 50 to 200) doesn't seem to do anything. I think that kicks in only really as an exception ? I'm logging data in a system that spits out a ~128 character line every 2mSec (though normally fewer than 128). I could pad it out so that it was always an even 128 bytes if that would help keep things aligned.

Here is the logic:

Wait until:

And then write:

So, it's a bit of a screwy heuristic that trades off throughput, SD card alignment, buffer management, and flush protection against losing data at power loss.

At (128/.002) =~ 64kbytes/sec, you'll be doing 2560 byte aligned increments, except when the SD card falls behind and you do larger writes. Yes, changing the ticks won't change anything; increasing 2560 to 4096-10240 might.

mirov commented 2 years ago

BTW, I'll bet your sensitivity to data loss and flushing buffers probably comes from ejecting battery packs on impact... I don't have that sort of issue, I just don't want to drop samples because something got busy.

I tried 4096 and 10240 didn't cause the nominal power to budge I'm afraid.

Perhaps we've turned all the "good" knobs already, I think the Lager portion of my system is really only using about 9mA at the moment, which isn't too bad. The total device draw is about 24mA though, so every mA is a big deal (about 1.5 hours of run time).

I'm going to let things settle for a while and see if anything has broken. The next change I make btw will likely be the utilization of the Lager USART TX path to send state/status info to the upstream processor that is at the moment open loop. I think a single encoded byte now and then will make little mysteries like a full card, write error, not finished booting yet, etc easier to debug.

Again, thank you so much for your assistance over the last few days. Say high to Jim and let's hook up on LinkedIn.

-Russ

On Sun, Oct 3, 2021 at 1:08 PM Michael Lyle @.***> wrote:

Bumping that timeout up by a lot (from 50 to 200) doesn't seem to do anything. I think that kicks in only really as an exception ? I'm logging data in a system that spits out a ~128 character line every 2mSec (though normally fewer than 128). I could pad it out so that it was always an even 128 bytes if that would help keep things aligned.

Here is the logic:

Wait until:

  • At least 2560 bytes have arrived (a minimum reasonable chunk to write), or
  • at least 200ms (50 ticks) have passed.

And then write:

  • 40k, if we have more than that to write (idea is to complete the write transaction and free up buffer ASAP, so this is a compromise between "finish fast before buffer fills" and throughput, else
  • The maximum amount we can finish at 512 byte alignment with, if we have more than 512 to write, else
  • Just whatever we have, otherwise.

So, it's a bit of a screwy heuristic that trades off throughput, SD card alignment, buffer management, and flush protection against losing data at power loss.

At (128/.002) =~ 64kbytes/sec, you'll be doing 2560 byte aligned increments, except when the SD card falls behind and you do larger writes. Yes, changing the ticks won't change anything; increasing 2560 to 4096-10240 might.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/d-ronin/openlager/issues/56#issuecomment-933017319, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJFJKXNN6H6L4ZEAXQRUITUFCZ3NANCNFSM5FGAOWKA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

mlyle commented 2 years ago

9mA doesn't sound bad at all! Yah, further gains are probably hard to find: this wasn't made to be power efficient at all-- but instead just to be reliable to get logs to storage no matter what. Nice to "meet" you and best of luck with the system you're building.