Azure / azure-storage-fuse

A virtual file system adapter for Azure Blob storage
Other
671 stars 208 forks source link

Avoid storage wear on Raspberry Pi #365

Closed gcormier closed 4 years ago

gcormier commented 4 years ago

Hi,

I'm looking at blobfuse on an rPI to record hydrophone and put them on Azure. So this is purely a one-way operation - capture local audio data, use FFMPEG to transcode to FLAC and a lower quality streaming version, and store in a storage account.

By setting TMP locations to a RAM disk, will this effectively limit the read/write to the rPI's SD card which has limited read/write cycles? Since the rPI has limited RAM (and hence space for RAM disk), should I set file-cache-timeout-in-seconds to something very low (single digit) to keep the TMP file system as empty as possible?

Secondly, to confirm my understanding - FFMPEG will be writing multiple files continuously. Will blobfuse block the close() call until the file is uploaded properly? If the upload link is slower in remote sites, we'll need to factor this in to the design. Right now we use FFMPEG to write 30 seconds of audio per file. Is there any way to force blobfuse to be asynchronous?

Thanks!

amnguye commented 4 years ago

Hi,

Thanks for your questions!

By setting TMP locations to a RAM disk, will this effectively limit the read/write to the rPI's SD card which has limited read/write cycles?

Yes. It will limit how much we cache to the RAM disk based on how much space is in the RAM disk, or how much you've allocated to the RAM disk.

Secondly, to confirm my understanding - FFMPEG will be writing multiple files continuously. Will blobfuse block the close() call until the file is uploaded properly?

Yes I also recommend you keep the file-cache-timeout-in-seconds to the single digits, to make sure the files in the cache are cleared up as often as possible.

Secondly, to confirm my understanding - FFMPEG will be writing multiple files continuously. Will blobfuse block the close() call until the file is uploaded properly?

Yes. Actually as long as you keep the file open, the kernel won't tell blobfuse to call close until you've closed the file.

Is there any way to force blobfuse to be asynchronous?

Did you mean uploading/adding multiple files to the blobfuse mount point at the same time? Did you try spinning off multiple process to upload or add multiple files to the mount point?

amnguye commented 4 years ago

Also if you don't care about the caching feature blobfuse has, you can also look into using AzCopy

http://github.com/Azure/azure-storage-azcopy

gcormier commented 4 years ago

Did you mean uploading/adding multiple files to the blobfuse mount point at the same time? Did you try spinning off multiple process to upload or add multiple files to the mount point?

Well I think FFMPEG will close the file, then open a new file to start writing again. So if that close takes 15 seconds because the network takes 15 seconds to upload, it won't open the new one until that close happens.

AzCopy could work, but I would then need to handle deleting old files on the local storage, whereas blobfuse will take care of this for me. I guess a trade off, as then I wouldn't have to compile blobfuse as well for rPI :)

amnguye commented 4 years ago

Sorry I'm probably misunderstanding something. Is FFMPEG a single threaded process? Sorry if I don't have a clear understanding on how it works. You can upload / add files to the mounted point and blobfuse will upload those files asynchronously using multiple processes.

Ah okay I see, I didn't know your other use cases for Azure Storage. But thanks for considering AzCopy!

gcormier commented 4 years ago

I am using FFMPEG to capture real-time audio to files.

ffmpeg -f pulse -ac 2 -ar $SAMPLE_RATE -thread_queue_size 1024 -i $AUDIO_HW_ID -ac $CHANNELS -ar $SAMPLE_RATE -sample_fmt s32 -acodec flac \
-f segment -segment_time "00:00:$FLAC_DURATION.00" -strftime 1 "$RAWFS/$NODE_NAME/raw/%Y-%m-%d_%H-%M-%S_$NODE_NAME-$SAMPLE_RATE-$CHANNELS.flac"

In this case, $RAWFS is a blobfuse mount point.

$FLAC_DURATION lets us cut up the file into segments, let's say 60 second chunks of audio. Ideally, the gap between these files is none.

So once we hit 60 seconds, FFMPEG will write the file, and open a new one. So if blobfuse is blocking the close() on the file until it is uploaded, over a slow link, it might be 5-10 seconds before FFMPEG returns from close() and opens a new file to start capturing real time audio. So we have a 10 second gap.

I'm not sure if FFMPEG will do the close asynchronously, or if it blocks. It was likely coded assuming a close() is nearly immediate. I'll have to do further testing.

amnguye commented 4 years ago

Unfortunately yes you are correct, it does take time to flush and upload the file to storage. We need to take some sort of lock on the file in order to upload it, which is why it doesn't return back until it's fully uploaded. That way we let the process know that we are still reading the file and to also emulate what a filesystem would do if it is not done with the file yet.

So I agree you're on the right track on how to handle this, in order to capture all the audio without losing a second. Ideally you should be start a new process to start the next recording while the other one finishes closing/uploads the file to storage.

gcormier commented 4 years ago

Okay. I think that gives me enough info. I'll take a look at blobfuse vs azcopy options and see which one works best :) Thanks for your help!