LIFsCode / ELOC-3.0

Firmware for ELOC 3.0 Hardware
MIT License
2 stars 3 forks source link

Sporadic buffer overrun #69

Closed LIFsCode closed 1 month ago

LIFsCode commented 5 months ago

@EDsteve @OOHehir I noticed sporadic buffer overruns. WAV buffer as well as inference buffer overruns.

I attached a log file and the config

I have seen this in various sporadic occurance of this, even with rather low sample rates.

I did a test with disabled "logToSdCard" and the result is much better, but still even here some WAV buffer overrun appear: log without sdcard logging

LIFsCode commented 5 months ago

@OOHehir I added a commit which pins each of the critical tasks to a specific core to give the system less random nature.

My idea was to keep I2S sample & wav writer on core 0 which giving AI a separate core 1 as it will be the most performance intense task. But that is free to be changed.

I also added the core IDs of the tasks to the performance monitor

OOHehir commented 5 months ago

@LIFsCode Do you think SD card access is the bottleneck? The EI task writes to the SD card with inference results just in case you're not aware....

ED has asked for a change (#68) to this logging which should result in less SD access. Also I'll make the wav_writer & inference tasks less verbose which should reduce the logging required.

OOHehir commented 5 months ago

@LIFsCode There's scope for more terminal output reduction if necessary

LIFsCode commented 5 months ago

@OOHehir @EDsteve I did some more tests last week and I'm pretty sure the SD card writing is the bottle neck. Unfortunately this also depends on the SD card type itself.

In fact the theoretically achievable speed is high enough for our requirements, but somehow we seem to be far off this theoretical speed.

I will write some tests to get a better understanding of the achievable write speeds with different file sizes. Let's hope this gives us a better understanding

EDsteve commented 5 months ago

@LIFsCode Great to hear you have seemed narrowed down the issue. I am constantly doing long term recording tests since two weeks or so with a few devices and the last commits from Owen has improved stability quite a lot. Much less BT connection problems and recordings don't don't seem stop automatically any more. If that info helps in any way. Still a few hick ups here and then though. If you want me to do specific tests. Let me know. I have 50 ELOCs and 6 different SD cards here to play around with :)

LIFsCode commented 5 months ago

@OOHehir @EDsteve I did a lot of measurements the last weeks and it turns out it is very likely that the buffer overuns are related to the SD card write speed.

I noticed significant different amount of buffer overruns with different SD card types which have differen write speed performance. I guess that's one of the reasons for different error probability in our different setups.

Besides that I could identify several things affecting the SD card wirte performance:

  1. Multiple Filesystem updates will significantly slow down write speed as it will result in intensive read modify write operations. File system updates are triggered by creating files, fsync, renaming etc.
  2. small file writing has also extremely bad impact concerning write performance

To sum it up, I guess the Log to SD card (and maybe the AI CVS file, but most likely neglectable) is the major cause for the WAV Buffer overruns. I will try to work around that issue by assembling the log data first in a RAM buffer and writing only full sector size entries to the file system, but that will take a while. This would include extracting the Lof file write to a low priority system task, so that it does not interfer with the WAV writer. Currently the Log files are written from the same context where the last ESP_LOGI() message is called and that could be a high priority task, thus potentially blocking the WAV writer

@OOHehir: what do you think about replacing the 2 WAV Buffers by e.g. 4 buffers (maybe half the size each) that would give us more tolerance if another SD card write is currently happeing and the SD card writing is taking longer.

OOHehir commented 5 months ago

@LIFsCode @EDsteve A number of related observations:

  1. Currently after a wav file is being finished the header (at the start of the file) is amended with the actual file size. This involves a seek which I imagine (haven't tested) attracts a performance penalty. Fixing the file size (#71) to exactly 60 secs (for example) would allow this the header to be written correctly at file creation.
  2. Most file openings or writes require a SD card check for available size & mount state. I imagine this attracts another penalty. Perhaps a low priority thread which runs once per minute (at most) that checks the card is mounted & records the available size would be more appropriate. Operations would then just check this retained value at opening/ write.

@LIFsCode Sure, I can modify the code, might be Friday before I get a chance. Perhaps benchmarking the change would be useful but I imagine it would be hard to get consistent/ useful results.

LIFsCode commented 5 months ago

@OOHehir I created a new branch for the debugging and added 2 metrics:

In case it proves useful we can merge it to master

OOHehir commented 5 months ago

@LIFsCode @EDsteve I tried reducing the two WAVFileWriter buffers from

 static const size_t wav_static_buffer_size = (static_cast<int>((1024 * 3) * 2/512)) * 512;

to

 static const size_t wav_static_buffer_size = (static_cast<int>((1024 * 1) * 2/512)) * 512;

The results look encouraging, during a quick test I didn't observe any unexpected wav buffer overruns. I attach logs of the output with the write performance output. I'm using a class 10 card, the write performance is:

larger buffers generally 1024 MB/s (worst 204 MB.s) smaller buffers generally 680 MB/s (worst 151 MB.s)

Perhaps more than 2 buffers won't be required? It'll need more testing of course. I've made the changes on the PerfDebugging branch.

LIFsCode commented 5 months ago

Ups I just noticed a typo, it should be kBytes/s (kB) not MBytes/s for the speed.

OOHehir commented 5 months ago

I didn't upload the logs either! buffer_1024.txt buffer_3072.txt

LIFsCode commented 5 months ago

@OOHehir: I just run the code with reduced buffer sizes and acutally I get significant more WAV buffer overruns. I think that makes sense, since the write speed is slower with smaller files. I guess the different behavior concerning buffer overruns highly depends on the used sd card.

LIFsCode commented 5 months ago

@OOHehir Here is the summary of tests I did with SD card write speed with different file & block sizes


#### Test with ESP IDF

I (11790) SD_TEST: Total file size same as block size
I (11791) SD_TEST:     1 passes @  1KB = 1024
I (11971) SD_TEST: Write:     5 kB/s
I (11971) SD_TEST: Read:   204 kB/s
I (11981) SD_TEST:     1 passes @  2KB = 2048
I (12011) SD_TEST: Write:    55 kB/s
I (12021) SD_TEST: Read:   292 kB/s
I (12031) SD_TEST:     1 passes @  4KB = 4096
I (12061) SD_TEST: Write:   110 kB/s
I (12071) SD_TEST: Read:   372 kB/s
I (12081) SD_TEST:     1 passes @  8KB = 8192
I (12111) SD_TEST: Write:   221 kB/s
I (12121) SD_TEST: Read:   512 kB/s
I (12131) SD_TEST:     1 passes @ 16KB = 16384
I (12171) SD_TEST: Write:   390 kB/s
I (12201) SD_TEST: Read:   564 kB/s
I (12201) SD_TEST:     1 passes @ 32KB = 32768
I (12241) SD_TEST: Write:   712 kB/s
I (12301) SD_TEST: Read:   606 kB/s
I (12301) SD_TEST:     1 passes @ 64KB = 65536
I (12351) SD_TEST: Write:  1213 kB/s
I (12491) SD_TEST: Read:   481 kB/s
I (12491) SD_TEST: Done.

I (8780) SD_TEST: Total file size 1048576
I (8990) SD_TEST:  1024 passes @  1KB = 1048576
I (108610) SD_TEST: Write:    10 kB/s
I (110150) SD_TEST: Read:   682 kB/s
I (110860) SD_TEST:   512 passes @  2KB = 1048576
I (113070) SD_TEST: Write:   358 kB/s
I (114630) SD_TEST: Read:   672 kB/s
I (114650) SD_TEST:   256 passes @  4KB = 1048576
I (116070) SD_TEST: Write:   730 kB/s
I (117630) SD_TEST: Read:   672 kB/s
I (117650) SD_TEST:   128 passes @  8KB = 1048576
I (119290) SD_TEST: Write:   630 kB/s
I (120930) SD_TEST: Read:   642 kB/s
I (121260) SD_TEST:    64 passes @ 16KB = 1048576
I (122750) SD_TEST: Write:   575 kB/s
I (124380) SD_TEST: Read:   644 kB/s
I (124740) SD_TEST:    32 passes @ 32KB = 1048576
I (125650) SD_TEST: Write:   826 kB/s
I (127400) SD_TEST: Read:   597 kB/s
I (127430) SD_TEST:    16 passes @ 64KB = 1048576
I (128330) SD_TEST: Write:  1132 kB/s
I (130490) SD_TEST: Read:   485 kB/s
I (130490) SD_TEST: Done.

#### Test with ESP IDF

I (9711) SD_TEST: Total file size 1048576
I (9841) SD_TEST:  1024 passes @  1KB = 1048576
I (109693) SD_TEST: Write:    10 kB/s
I (111736) SD_TEST: Read:   513 kB/s
I (112033) SD_TEST:   512 passes @  2KB = 1048576
I (113935) SD_TEST: Write:   477 kB/s
I (116002) SD_TEST: Read:   507 kB/s
I (116019) SD_TEST:   256 passes @  4KB = 1048576
I (117064) SD_TEST: Write:   988 kB/s
I (119134) SD_TEST: Read:   506 kB/s
I (119152) SD_TEST:   128 passes @  8KB = 1048576
I (120279) SD_TEST: Write:   916 kB/s
I (122336) SD_TEST: Read:   509 kB/s
I (122362) SD_TEST:    64 passes @ 16KB = 1048576
I (123792) SD_TEST: Write:   720 kB/s
I (125849) SD_TEST: Read:   509 kB/s
I (125880) SD_TEST:    32 passes @ 32KB = 1048576
I (126720) SD_TEST: Write:  1205 kB/s
I (129244) SD_TEST: Read:   415 kB/s
I (129275) SD_TEST:    16 passes @ 64KB = 1048576
I (129787) SD_TEST: Write:  1934 kB/s
I (132300) SD_TEST: Read:   417 kB/s
I (132301) SD_TEST: Done.

The actual values differ quite a lot, so this is not statistically proven, but gives a hint about effect of write block size and write performance. It is always the same file size which is written, but with different number of blocks with different sizes.

This clearly shows that block writes with less than 2 or 4k are extremely inefficient.

So this is definetly a strong hint to rework the logging framework: https://github.com/LIFsCode/ELOC-3.0/issues/73

OOHehir commented 5 months ago

@LIFsCode Out of interest do you know what SD class the card above is? I'm using a class 10.

Do you think reducing the wav buffer size is useful/ worthwhile? Currently the wav class writes out to the SD card when a buffer is filled & a signal is received from the I2S class. If we want multiple (i.e >2) wav buffers it might make sense to move to a single circular buffer & write out in 2K chunks. I'd be concerned that if the ELOC is expected to record for hours continuously this may just put off the inevitable overruns however.

The other option is to restrict support to SD cards of a min class. A quick look online suggests Class 10 cards are fairly common/ min standard.

LIFsCode commented 5 months ago

I must admit I have no idea, I'm using a fairly old SD cardamd I can't read the model anymore, so it's most likely not a class 10

OOHehir commented 5 months ago

@LIFsCode @EDsteve OK, I'll think I'll revert the changes above & come back to this if its still an issue after #73 is implemented

LIFsCode commented 4 months ago

@EDsteve I added a new command with this commit 104111e: getSdSpeedTest

This gives you a measure for the speed of the sd card. Should only be executed when the ELOC is idle.

It tests writing a 512kByte file with different block sizes (1k - 64k)

For a stable recording with 48 kHz it must at least be ~100 kByte/s (48 kHz @16 bit => 768 kBit/s => 96 kByte/s) In fact it must be higher, since there are other tasks writing to sd card as well.

EDsteve commented 4 months ago

@LIFsCode Took a while but i have results. All cards we use have minimum 220KB/s write speads. That should be fine. Here two screenshots from Lexar card and Sandisk:

LEXAR: SANDISK:
LEXAR_Screenshot_20240216-110841_ELOC Control Panel SANDISK_Screenshot_20240216-111305_ELOC Control Panel
OOHehir commented 1 month ago

@EDsteve Have you seen this issue recently? Think it can be closed?

EDsteve commented 1 month ago

@OOHehir You are right. Haven't seen it in a long time. -> CLOSED