AudioKit / Waveform

GPU accelerated waveform view
MIT License
199 stars 15 forks source link

Slow `toFloatChannelData` performance with Dolby Atmos audio #8

Open markst opened 4 months ago

markst commented 4 months ago

macOS Version(s) Used to Build

macOS 14.5 Sonoma

Xcode Version(s)

Xcode 15.4

Description

When using mp4 audio format which contains AC-3 audio of about 04:30 the cpu maxes out for over a minute.

Curious to know if toFloatChannelData() is thread safe as I may be able to then display a loading indicator whilst wave is loading.

Crash Logs, Screenshots or Other Attachments (if applicable)

image
markst commented 4 months ago

Happy to supply mp4 but will need approval. I may be able to find an example unlicensed.

I have also tried using ffmpeg to strip the video track with the following:

ffmpeg -i input.mp4 -vn -acodec copy output.mp4
markst commented 4 months ago

ffmpeg format output:

[STREAM]
index=0
codec_name=eac3
codec_long_name=ATSC A/52B (AC-3, E-AC-3)
profile=Dolby Digital Plus + Dolby Atmos
codec_type=audio
codec_tag_string=ec-3
codec_tag=0x332d6365
sample_fmt=fltp
sample_rate=48000
channels=6
channel_layout=5.1(side)
bits_per_sample=0
initial_padding=0
id=0x1
r_frame_rate=0/0
avg_frame_rate=0/0
time_base=1/48000
start_pts=0
start_time=0.000000
duration_ts=12983808
duration=270.496000
bit_rate=768000
max_bit_rate=N/A
bits_per_raw_sample=N/A
nb_frames=8453
nb_read_frames=N/A
nb_read_packets=N/A
DISPOSITION:default=1
DISPOSITION:dub=0
DISPOSITION:original=0
DISPOSITION:comment=0
DISPOSITION:lyrics=0
DISPOSITION:karaoke=0
DISPOSITION:forced=0
DISPOSITION:hearing_impaired=0
DISPOSITION:visual_impaired=0
DISPOSITION:clean_effects=0
DISPOSITION:attached_pic=0
DISPOSITION:timed_thumbnails=0
DISPOSITION:non_diegetic=0
DISPOSITION:captions=0
DISPOSITION:descriptions=0
DISPOSITION:metadata=0
DISPOSITION:dependent=0
DISPOSITION:still_image=0
TAG:language=und
TAG:handler_name=sound handler
TAG:vendor_id=[0][0][0][0]
[SIDE_DATA]
side_data_type=Audio Service Type
service_type=0
[/SIDE_DATA]
[/STREAM]
[FORMAT]
filename=output.mp4
nb_streams=1
nb_programs=0
format_name=mov,mp4,m4a,3gp,3g2,mj2
format_long_name=QuickTime / MOV
start_time=0.000000
duration=270.496000
size=25968454
bit_rate=768024
probe_score=100
TAG:major_brand=isom
TAG:minor_version=512
TAG:compatible_brands=isomdby1iso2mp41
TAG:encoder=Lavf60.16.100
[/FORMAT]
markst commented 3 months ago

Perhaps it's worth adding that the Atmos mp4 has 12 channels:

(lldb) po format
<AVAudioFormat 0x60000212c5f0: 12 ch,  48000 Hz, Float32, deinterleaved>

Where as a typical mp3 has 2:

(lldb) po format
<AVAudioFormat 0x600002132300:  2 ch,  44100 Hz, Float32, deinterleaved>

I might have a go at optimising to only render using first two channels.

Potentially could use UnsafeBufferPointer to directly access the memory of pcmFloatChannelData, which could potentially be faster than the nested loops:

        // Create the result array with preallocated capacity
        var result = [[Float]](repeating: [], count: channelCount)
        for channel in 0..<channelCount {
            result[channel] = Array(UnsafeBufferPointer(start: pcmFloatChannelData[channel], count: frameLength * stride))
        }