Add a high-level interface à la playrec

mgeier commented 10 years ago

I would like to have a high level interface similar to playrec (http://www.playrec.co.uk/) - but of course with a much better API!

Basically it should provide:

a function to initialize PortAudio (set device ID, samplerate, latency, other low level stuff, ...). This should be optional. It should probably be allowed to call it several times to set one setting at a time?
a function to play a NumPy array on a given set of output channels (specified by another iterable)
a function to record a given time of audio from a given set of input channels into a NumPy array
a function to simultaneously (and of course sample-accurately) do both playback and recording

This could be realized by either a new class or as a new submodule, I guess there are only minor differences. The hardest thing would probably be to find a name for it. The individual functions could also be dropped directly into the main module, but then the state (a.k.a. the "preferences") would also have to be stored directly in the module namespace. In the latter case there would be no clear separation between low-level and high-level interface, but it would have the benefit that no name has to be invented for a new class/submodule.

Any comments?

mgeier commented 10 years ago

For providing blocking functions we probably have to import the threading module.

Would this be a reason to create submodules?

bastibe commented 10 years ago

Playrec. Probably the worst API I have seen in years.

Since playback and recording already work asynchronously for up to one block length, parts of this already work:

with Stream() as s:
    recording = s.read()
    s.write(samples) # no more than one block length!

As long as both functions read/write no more than one block length and run on the same device, this works as intended.

It would probably be a good idea to add channel-maps to both read and write, and possibly even combine them into one readwrite function to something like

with Stream() as s:
    recording = s.readwrite(samples, channels=(1, 3, 7))

This function would merely chop up the data in blocks and serve them to read and write in a loop. Is this sufficient?

mgeier commented 10 years ago

This function would merely chop up the data in blocks and serve them to read and write in a loop. Is this sufficient?

I'm not quite sure, but I think if we call read() and then write() it's not guaranteed that they operate on the same audio block. The change between one block and the next might happen in between the two calls.

I think we should use the callback API for that. There it's guaranteed that input and output work on the same audio block.

mgeier commented 10 years ago

I think #25 shows exactly this problem! I'm more and more sure that the callback API is the only reliable way to go.

jeremygray commented 10 years ago

I would be very interested in an API library of high-level, low-latency sound playing and recording. This is for PsychoPy, software for experimental psychology and neuroscience (currently multiple-1000's of users: http://psychopy.org/usage.php). pygame.sound has latency issues, pyo is fabulous but overkill and not really intended for our use cases. pysoundcard & file seem like a good fit.

Essential features--these need to be possible, nicest for me if they are available through in an API (rather than me having to write wrappers myself):

control over selection of input / output hardware
play: nonblocking, mono & stereo sounds from numpy arrays or files (.wav, nice to support .flac if possible). multichannel might be nice but not as important.
record: nonblocking, mono or stereo, save to file, be able to access data in near real-time (I think callback supports this)

Nice to have but I could implement for myself if out of scope for pysoundcard:

voice-offset end of recording (detect silence after non-silence, and terminate recording after a delay)

I already have implemented a few basic functions for sound in a numpy array, happy to contribute some code to pysoundcard if useful:

get audio power / loudness (RMS) of a sound, optionally in bins
basic FFT (I use it for detecting a short 19KHz sound that I use as an auditory event marker)
speech segmentation (I often care about speech response times, so want an accurate estimate of when speech starts relative to some other external event)

Brain storming:

Not sure how easy to code or how widely used it would be, but we've had a couple inquiries about support for input from midi devices. (I know nothing about midi myself.)
Speech recognition would be cool. I had google's v1 API working from python, but they just changed their API to require a key, with restrictions on usage. wit.ai is another online speech-recognition-in-the-cloud service, looks promising

mgeier commented 10 years ago

@jeremygray: Thanks a lot for the suggestions, those are really helpful to us!

control over selection of input / output hardware

That's definitely an area where I hope we can make some improvements. You can already select devices and change their settings with the device dictionaries, but I hope we find a way to make this easier.

play: mono & stereo sounds from numpy arrays

That's the main purpose of PySoundCard. That's already possible but I would like to make it easier. I'd like to have something like this:

import pysoundcard as pa
pa.play(myarray)

or files (.wav, nice to support .flac if possible).

Handling sound files is none of PySoundCard's business. That's PySoundFile's duty. Or you can use anything else if you prefer. There is no dependency there and there will never be one. The only thing that both must support are NumPy arrays.

Of course we try to design the APIs in a way that they fit nicely together.

multichannel might be nice but not as important.

For me personally that's important, so I will definitely suggest some features for that. For example if I want to play a three-channel NumPy array on channels 5, 27 and 56:

pa.play(myarray, channelmask=[5, 27, 56])

... or something like that.

record: nonblocking, mono or stereo,

Yes, I'm planning both blocking and non-blocking functions.

save to file,

This is again out of scope. Can be done with PySoundFile or any other sound file library that knows NumPy.

It would of course be nice to write a file continuously (block by block), but we'll have to check how this works performance-wise.

be able to access data in near real-time (I think callback supports this)

That's an interesting idea!

Can you elaborate how you would want to use that?

One important aspect here is that we try to avoid memory allocations during the runtime of a stream, so it would be good to know the maximum duration of the recording to be able to allocate the necessary memory beforehand.

voice-offset end of recording (detect silence after non-silence, and terminate recording after a delay)

I'd say the detection of silence is out of scope, but we should try to make it easy to implement such a thing.

get audio power / loudness (RMS) of a sound, optionally in bins

I guess this is out of scope unless PortAudio provides that information.

basic FFT (I use it for detecting a short 19KHz sound that I use as an auditory event marker)

Definitely out of scope to provide exactly that, but probably we can think of some kind of plug-in mechanism to allow implementing that in a straightforward way.

speech segmentation (I often care about speech response times, so want an accurate estimate of when speech starts relative to some other external event)

The speech detection is out of scope, but again, probably we can provide something to make it easier to implement.

Not sure how easy to code or how widely used it would be, but we've had a couple inquiries about support for input from midi devices. (I know nothing about midi myself.)

Only if PortAudio supports it (which I think it doesn't, right?). MIDI IO should be provided by a separate library.

Speech recognition would be cool. I had google's v1 API working from python, but they just changed their API to require a key, with restrictions on usage. wit.ai is another online speech-recognition-in-the-cloud service, looks promising

We definitely won't include any of those as a dependency, but we can try to provide some kind of generalized interface for that. I have no clue how those APIs work, it would be very helpful if you could describe how such an interface should look like.

jeremygray commented 10 years ago

Terrific, thanks. This has really helped me understand things (esp the scope of pysoundcard).

I forgot to mention that playing multiple sounds at the same time would be very useful. I'm not sure how the continue_flag and complete_flag work (and might conflict if there are multiple things going on).

Near real-time access to the data from microphone would allow fun things like: 1) speech offset detection (e.g., based on a simple heuristic like "if the most recently recorded block was below threshold loudness (and previously was above threshold), then stop the recording"), 2) interactive visual display of microphone loudness, and so on. For these use-cases, a single block-worth of memory might be enough to go on. For other cases, maybe more memory would be needed?

Loudness / RMS is basically the same as numpy.std(data) since the mean is almost always effectively 0 for sounds.

I would not ask you for a plug-in for doing FFT-related things. I'd just work with the numpy array. No need to build a plugin system just to allow that. (The design choice to give numpy arrays is really a selling point for me of pysoundcard!)

If there's even a place for speech recognition somewhere, it sounds like it might be pysoundfile. Google's API (and I think wit.ai as well) is basically that you construct and send a URL request that contains a sound file and some parameters (e.g., the expected language of the speaker, etc). You get back a json-format reply with the transcription, confidence level, and so on.

I have no idea about PortAudio and MIDI support.

Allowing for long recordings to be effectively streamed to disk (i.e., if they would exceed the capacity of RAM) would be nice.

mgeier commented 10 years ago

I forgot to mention that playing multiple sounds at the same time would be very useful.

If you can use multiple streams in PortAudio is platform-dependent. Quoting the header file:

Depending on the underlying Host API, it may be possible
to open multiple streams using the same device, however this behavior
is implementation defined. Portable applications should assume that
a PaDevice may be simultaneously used by at most one PaStream.

I'm planning to design a high-level API under the assumption that there is only exactly one stream available.

But if your host API allows it, you can of course use multiple instances of the Stream class.

I think playing multiple sounds at the same time (in a host-API independent way) is actually out of scope for PySoundCard, this is really the realm of PyGame. I guess this functionality is also one of the reasons for the latency issues you were mentioning.

I'm not sure how the continue_flag and complete_flag work (and might conflict if there are multiple things going on).

If you have multiple streams running at the same time (assuming that your host API supports that) they are really independent. Each stream has it's own callback and the continue_flag and complete_flag only affect the associated stream. At least theoretically. I don't have any practical experience with running multiple parallel streams.

jeremygray commented 10 years ago

Two streams definitely work for me (Mac, core audio) using two Stream instances and two different callback functions (and two global play_position). I am unsure about a latency penalty (probably).

If possible, it would be nice if a high-level API would assume that there might be more than one stream--in the sense of not making it harder / impossible to do two streams at once through that API.

bastibe commented 10 years ago

There are many very interesting ideas in this issue. Thank you for sharing your thoughts, Jeremy!

One big problem of PySoundCard is that we are limited by the features provided by Portaudio. In particular, some features of Portaudio work differently on different platforms and some features are quite buggy, frankly. On the Mac, Portaudio works reasonably well. ALSA works pretty fine, too (if you ignore that ALSO likes to configure 32-channel default devices). Windows APIs can be pretty flaky though.

Beyond that though, a high-level interface would be wonderful!

Here's some brain storming:

How about some functions that can queue up/play sounds on a mixer in its own thread?
How about a context manager as a shorthand for the callback?
How about a version of the callback that automates overlap-adding blocks?

mgeier commented 10 years ago

@jeremygray:

Two streams definitely work for me (Mac, core audio) using two Stream instances and two different callback functions (and two global play_position). I am unsure about a latency penalty (probably).

That's good to know, thanks!

If possible, it would be nice if a high-level API would assume that there might be more than one stream--in the sense of not making it harder / impossible to do two streams at once through that API.

That's a trade-off we'll have to make: multiple streams is of course more flexible, but it would also make the high-level API more complex. When starting a stream we would have to return some kind of object which would enable us to stop the stream again. I actually tried to implement this and I found it to be quite ugly and confusing.

Having only one stream would allow us to use play(myarray), then do something else, then stop(). I think it should be as easy as that.

Also, in many simple use cases it is not even desired to have two streams at the same time. E.g. in an interactive session, if I want to listen to two arrays one after another and I forget to stop the first one, I get an (in this case undesired) mixture of both and I'll have a hard time to stop any of them quickly.

How would you trade off flexibility and ease-of-use in this case?

@bastibe:

How about some functions that can queue up/play sounds on a mixer in its own thread?

That probably goes a bit too far ... what use case are you imagining? I think we shouldn't try to re-implement PyGame, but probably you're thinking about a different scenario?

How about a context manager as a shorthand for the callback?

I like that idea! But the body of the context manager cannot be invoked repeatedly ... probably __enter__() should return a generator object which gives access to input/output by iterating over it?

Maybe something like this (completely un-tested):

with pa.something(...) as stream:
    for input, output, time, status in stream:
        if some_condition():
            break
        output[:] = input

I don't see a way of doing this in a non-blocking way, though.

How about a version of the callback that automates overlap-adding blocks?

That sounds interesting, this would probably play well together with the blocks() feature of PySoundFile. I think PySoundCard shouldn't do any windowing, but overlap-adding might be OK.

bastibe commented 10 years ago

In one project, I regularly wanted to play some sound file in the background. To do this, I fired up a thread with the audio data, and just had it pysoundcard-play the thing in the background. If I wanted to stop playback prematurely, I simply terminated the thread. This is pretty simple and works pretty well (there are some finicky details with premature termination). This worked on Windows and OS X last time I checked.

While this is somewhat in the territory of pygame, it actually is a useful function to have. Since pysoundcard already has some concurrency logic, this is probably not too far-fetched. As far as I know, there is nothing in portaudio that restricts you from running multiple streams concurrently.

We should probably look into how difficult this would be to implement with Python threads. If it is trivial, it can be left to the user. If it is somewhat more involved, some dedicated function would probably be warranted. Maybe we can find a way of wrapping this in such a way that it is obvious that we are using threads, and can thus rely on Python's built-in functionality for dealing with threads to control the code.

[You probably meant a] generator object [that] gives access to input/output by iterating over it [instead of a context manager]?

This is of course what I meant. Sorry for the confusion. I was thinking about something very similar to what you outlined in the code block.

I think PySoundCard shouldn't do any windowing, but overlap-adding might be OK.

Sounds good.

mgeier commented 10 years ago

I created a page in the Wiki about how a high-level API should look like: https://github.com/bastibe/PySoundCard/wiki/High-Level-API

Feel free to edit the page and add more suggestions!

I think we may talk about two different sets of high-level functions here. One set is about quick usage in an interactive session (what I originally aimed for with this issue), the other one is about high-level functionality which would be used in normal Python code (generator objects, speech recognition, ...).

stuaxo commented 9 years ago

I won't put these on the wiki yet, since this is not fully formed - and am not sure what the API would look like.

Things that come to mind from current project:

API for playing audio that loops (optional start / end point), stuff like envelopes (at least fade in, fade out).

With the above + pitch changing stuff there would be enough to build things like a mod players.

Then again there might already be libraries to do audio effects on numpy arrays?

The Web Audio API is surprisingly not bad with some nice concepts like routing and comes with some basic filters:

http://creativejs.com/resources/web-audio-api-getting-started/

mgeier commented 9 years ago

Please do put everything on the wiki page. It was made for sharing ideas and doing some brainstorming ... nothing "fully formed" is there ... feel free to add things and edit existing stuff!

Please also make suggestions for the API, i.e. hypothetical function calls that show how it would be used.

stuaxo commented 9 years ago

Of the things I put down, the 'webaudio' like API thing is the highest on my wishlist... I've got a few ideas of what I'd like for music-vis type things, I'm trying to prototype at the moment .. having easy access to stuff filters like bandpass filters would be good - along with beatdetection and similar stuff..

bastibe / PySoundCard

Add a high-level interface à la playrec #19