Port over Sounds.jl API functions

JuliaAudio / SampledSignals.jl

Core types for regularly-sampled multichannel signals like Audio, RADAR and Software-Defined Radio

Other

74 stars 25 forks source link

Port over Sounds.jl API functions #32

Open ssfrr opened 6 years ago

ssfrr commented 6 years ago

I'm splitting out the various discussions from #29 so they're easier to keep track of separately. Sorry some of the formatting isn't as pretty, I opted for clarify in who said what. Please don't hesitate to let me know if I missed something important.

Sounds.jl has a nice API for building and manipulating sounds, which is currently focused on buffers, but seems like it would also be useful for streaming, e.g. one could put together a quick on-line filtering pipeline:

record() |> lowpass(2kHz) |> amplify(20dB) |> play()

Or if you have a gigantic (larger than memory) audio file that you wanted to filter you could do:

loadstreaming("source.wav") do src
    loadstreaming("dest.wav") do dest
        src |> lowpass(2kHz) |> amplify(20dB) |> rampon(500ms) |> rampoff(500ms) |> dest
    end
end

Hopefully pretty soon loadsteaming and savestreaming into FileIO.jl. We'd need to define a method to make dest a callable to enable this API, or else use a different operator than |>. doing rampoff streaming would have to add latency as long as the fade time (so it could start applying the fade when its input stream ends), but that's not a deal-breaker and is a really nice API. AudioIO.jl might be the right place for most of the Sound.jl functions to live (most of them don't seem particularly psychoacoustic-specific, and are nice general purpose audio processing)

from @haberdashPI

I like your point about the sound manipulation functions potentially being more general purpose. And yeah, AudioIO sounds like a reasonable place for those functions to live.

I actually implemented these methods for streaming audio a while back. I ended up being unhappy with my actual implementation of streams: they were buggy and hard to troubleshoot, and I ended up finding a simpler solution to the particular problem I was facing at the time, that didn't require streams.

However, with a well defined API for low level stream operations that isn't buggy, I could easily resurrect the versions of the sound manipulation functions that I wrote to work with streams, to save time. They were pretty similar to the static sound versions.

I like your idea for how to work with streams using the pipes, that makes sense to me. Here are a few adjustments to it:

loadstreaming("source.wav") do src
    loadstreaming("dest.wav") do dest
        src |> lowpass(2kHz) |> amplify(20dB) |> ramp(500ms) |> write(dest)
    end
end

That is, rampon and rampoff can be changed to ramp, and at the end dest need not itself be a fucntion, you can have a function that returns a function that writes to that stream. Maybe it isn't called write, maybe it's named something else.

followup from @haberdashPI

Bottom line - I think this should be pretty straightforward to do.

I took a brief look at what streaming looks like in a little more detail, and it does seem like it would be pretty straightforward to implement the Sounds operations I've defined over them without much trouble. It seems like this change could happen at a different pace: i.e. these could be defined over SampleBuf and then eventually extended to streams once a design has been settled on.

Promotion seems like it would work pretty naturally here: if the stream returns a SampleBuf of the wrong type (assuming we use type to indicate format), then operations will first promote the chunk of audio being read and then manipulate the sound.

ssfrr commented 6 years ago

I kind of like being able specify the destination directly rather than needing a write function, e.g.

loadstreaming("dest.wav") do dest
    [...] |> dest
end

but that would require something like (sink::SampleSink)(x) = write(sink, x), which currently doesn't work. Barring that, write seems like the right function, so we could have write(sink::SampleSink) = data->write(sink, data), which seems reasonable.

I went through the Sounds.jl docs to start thinking through the translation. Most of them seem pretty straightforward, I have questions on a couple.

Sources

tone
noise - specify whitenoise vs pinknoise?
silence
harmonic_complex - just harmonics?
irn

Processers

left
right
asmono
asstereo
highpass
lowpass
bandpass
bandstop
ramp - allow in and out times separately to replace rampon and rampoff functions? I would also call this fade - when I think of ramp I think of a signal generator that makes a ramp signal.
fadeto - thinking of this as fadeto(from, to), I'd think of the 1-arg version as fadeto(to), which I think is the reverse of how this works now. Also a delay option would be helpful to wait before applying the crossfade.
amplify
normpower - as a stream processor this would have buffer the whole signal until the end, but that's unavoidable.
envelope - would it make sense to merge this with amplify?
mix
leftright - maybe doesn't make sense in a |> chain, but I'm not sure
resample
dc_offset

Promotion seems like it would work pretty naturally here: if the stream returns a SampleBuf of the wrong type (assuming we use type to indicate format), then operations will first promote the chunk of audio being read and then manipulate the sound.

I think that when streams are composed with |> the promotion would be happening on the stream level, not the buffer level. So we'd have methods both for lowpass(buf::SampleBuf, freq) which would apply the filter and return a new buffer, and lowpass(src::SampleSource, freq), which would wrap the given src and return a new LowPassSource <: SampleSource that could be read from.

Of course that brings up the trickier question of what lowpass(freq) should return, as it doesn't know ahead of time whether it's going to be called on a buffer or stream.

I'll have to spend some more time thinking about use-cases.

haberdashPI commented 6 years ago

I kind of like being able specify the destination directly rather than needing a write function

Agreed. If that regression ever gets fixed, it would make sense to add a callable interface for the abstract type SampleSink.

harmonic_complex - just harmonics?

Sure! I like that name.

ramp - allow in and out times separately to replace rampon and rampoff functions? I would also call this fade - when I think of ramp I think of a signal generator that makes a ramp signal.

I like fade, that seems more consistent with fadeto. Perhaps the in and out times are specified via keywords (start and stop?), where you must either pass a single, positional argument or all keywords (e.g. ramp(50ms) ramp(start=50ms) and ramp(stop=100ms) are all valid but ramp(50ms,stop=20ms) is not, since this can be written as ramp(start=50ms,stop=20ms))

fadeto - thinking of this as fadeto(from, to), I'd think of the 1-arg version as fadeto(to), which I think is the reverse of how this works now. Also a delay option would be helpful to wait before applying the crossfade.

If I recall I think I miswrote the documentation in an earlier version, it's possible that is currently showing as the stable docs (that's embarrassing <_< ...). It's definitely implemented as fadeto(to).

normpower - as a stream processor this would have buffer the whole signal until the end, but that's unavoidable.

I could also imagine some approximations to this that wouldn't require the whole signal: e.g. the normalize the power over some window of time, or compute the power from 0 to time t (which you could compute incrementally at each frame) and normalize by that.

envelope - would it make sense to merge this with amplify

Ah, yes, I like that. I used to call it mult, but I didn't like that, and I went with envelope, since it seemed like the most common use, but amplify seems better.

leftright - maybe doesn't make sense in a |> chain, but I'm not sure

It also occurs to me that this is really the same as hcat. So maybe it just goes away.

I think that when streams are composed with |> the promotion would be happening on the stream level, not the buffer level...

Yeah, that seems reasonable!

Of course that brings up the trickier question of what lowpass(freq) should return, as it doesn't know ahead of time whether it's going to be called on a buffer or stream.

Wouldn't this just be lowpass(freq) = x -> lowpass(x,freq). If I understand correctly, during compilation this would specialize the anonymous function for the type of x, and so dispatch to buffers and streams would just work, so long as a method for lowpass(x::SampleSource,freq) is defined.

haberdashPI commented 6 years ago

Another little bit of brainstorming:

The high level methods for filtering (lowpass etc...) are good if you just need a quick one off and don't care too much about filter design, but if you want to design a particular filter for your application it probably makes sense to override ~DSP.filter~ DSP.filt so that it returns a SampleBuf or SampleSink when passed one of those objects (and allow for function currying?).

rob-luke commented 6 years ago

I am a little late to this conversation, so apologies if this feature is already available or discussed elsewhere.

I was thinking it would be nice if the processing (including parameters) that was applied to a signal could be stored somewhere in the type. So you could query the signal to know how it has been treated.

This can be useful when:

someone shares data with you
you pass a signal to a follow up function, the follow up function can check that required pre-processing has been completed. For example some custom processing could check DC removal has been performed in advance and throw a warning if not

An example might be...

Load signals and save to variables

loadstreaming("source.wav") do src
  src |> signal_unprocessed
  src |> resample(8kHz) |>  lowpass(2kHz) |> rampon(500ms)  |>  signal_processed
end

You can currently prove the data like this

samplerate(signal_unprocessed)  # = 48kHz
samplerate(signal_processed)    # = 8kHz

It would be great if you could do something like

processing(signal_unprocessed)  
# Empty or None or [] or Dict{Any,Any} with 0 entries etc

processing(signal_processed)  
# Dict{String,Dict{String,String}} with 2 entries:
#   "rampon"   => Dict("RampLength"=>"500ms","RampFunction"=>"Linear")
#   "lowpass"  => Dict("CuttOff"=>"2kHz","Filter"=>"DSP.FilterObject")
#   "resample" => Dict("SampleRate"=>"8kHz","Filter"=>"DSP.FilterObject")

or maybe

signal_processed.processing{1}
# "resample" => Dict("SampleRate"=>"8kHz","Filter"=>"DSP.FilterObject")

signal_processed.processing{2}
# "lowpass"  => Dict("CuttOff"=>"2kHz","Filter"=>"DSP.FilterObject")

And you could check if required processing has been completed as

contains(processing(signal_unprocessed), "lowpass")
# false

contains(processing(signal_processed), "lowpass")
# true

haberdashPI commented 6 years ago

@codles: That's an interesting idea. What purpose would you imagine putting this to? I'm trying to think about what happens when you mix two signals in this setup. You can't just merge the steps applied into a one data structure, because they're only applied to part of the signal. So somehow you have to have the metadata apply to only part of the resulting signal (and how or whether that's worth expressing seems like it would depend on the intended purpose).

A small technical point; I don't think this would work:

loadstreaming("source.wav") do src
  src |> signal_unprocessed
  src |> resample(8kHz) |>  lowpass(2kHz) |> rampon(500ms)  |>  signal_processed
end

Because |> is just a regular operator (x |> f(y) == f(y)(x)), signal_processed and signal_unprocessed have to already be defined. I think, in the current conception of the API, what you want this to do would be written as follows:

loadstreaming("source.wav") do src
  signal_unprocessed = src
  signal_processed = src |> resample(8kHz) |>  lowpass(2kHz) |> rampon(500ms)

  # do stuff with signal_unprocessed and signal_processed here...
end

ssfrr commented 6 years ago

Nice, this is coming together!

I like fade, that seems more consistent with fadeto. Perhaps the in and out times are specified via keywords (start and stop?), where you must either pass a single, positional argument or all keywords (e.g. ramp(50ms) ramp(start=50ms) and ramp(stop=100ms) are all valid but ramp(50ms,stop=20ms) is not, since this can be written as ramp(start=50ms,stop=20ms))

What about having a 1-arg method that applies to both start and stop, and a 2-arg method where you specify both separately? So then if you want to apply a fade to one or the other you can just do something like fade(1s, 0s). I think that would be pretty clear, and less typing. Also because start and stop have a natural ordering it's clear what order they should come as positional arguments.

normpower - as a stream processor this would have buffer the whole signal until the end, but that's unavoidable.

I could also imagine some approximations to this that wouldn't require the whole signal: e.g. the normalize the power over some window of time, or compute the power from 0 to time t (which you could compute incrementally at each frame) and normalize by that.

I think if we have some kind of windowing and time-varying gain then we're getting into more complex compression / loudness normalization that I think we can defer to the future. It's useful to have, but there's a lot of history and specification around loudness (e.g. the broadcast world has a bunch of specs) that doesn't need to be coupled to these changes. I think for now just having normpower act on the whole signal should be the default behavior, and in the future this could be made more sophisticated via keyword args or different functions.

Of course that brings up the trickier question of what lowpass(freq) should return, as it doesn't know ahead of time whether it's going to be called on a buffer or stream.

Wouldn't this just be lowpass(freq) = x -> lowpass(x,freq). If I understand correctly, during compilation this would specialize the anonymous function for the type of x, and so dispatch to buffers and streams would just work, so long as a method for lowpass(x::SampleSource,freq) is defined.

Ah yes, you are totally right here. 🤦‍♂️.

The high level methods for filtering (lowpass etc...) are good if you just need a quick one off and don't care too much about filter design, but if you want to design a particular filter for your application it probably makes sense to override DSP.filt so that it returns a SampleBuf or SampleSink when passed one of those objects (and allow for function currying?).

Yeah, definitely (though generic filtering support could probably be pushed out to the future). There have been one or two times in the past where I've needed to submit PRs to DSP.jl to make sure that the functions use similar rather than Array to allocate their return data, and that way we don't actually need to have specific SampleBuf implementations, though it's also not a big deal to have something like DSP.foo(buf::SampleBuf, args...) = SampleBuf(foo(buf.data, args...), samplerate(buf)) for DSP.jl functions that don't handle AbstractArray subtypes appropriately.

@codles I think that storing up metadata on the processing graph is out-of-scope for this refactor. I wouldn't want to complicate the API to support it, so if we do it I'd rather find a way to add it on top of the simple system. It also seems like there are a bunch of open questions about exactly how it would work. Can you propose it in a separate issue?

haberdashPI commented 6 years ago

What about having a 1-arg method that applies to both start and stop, and a 2-arg method where you specify both separately?

Ah yes, that seems simpler, and still quite clear.

I think for now just having normpower act on the whole signal should be the default behavior

Absolutely. Seems reasonable as a first pass.

here have been one or two times in the past where I've needed to submit PRs to DSP.jl...

Ah yes: it may be better to submit a PR there. I think there are still some cases that don't work, but maybe that's changed.

haberdashPI commented 5 years ago

Note that #44 is the first step to addressing this issue.