LMMS / lmms

Cross-platform music production software
https://lmms.io
GNU General Public License v2.0
8.15k stars 1.01k forks source link

Trying to understand the whats and whys of MixerWorkerThread #3416

Closed PaulBatchelor closed 7 years ago

PaulBatchelor commented 7 years ago

Hey folks,

Still learning my way around the LMMS codebase. While investigating some bugs related to export-crashes (related to #3404), I came across the MixerWorkerThread class, which appears twice in main the audio render loop.

The gist of what I think it is doing is it is queing up a bunch of worker threads, setting them off, then waiting for them to finish. You seem to do this once for the note events, and then again for the LMMS audio graph.

I am having more trouble understanding how and why something like MixerWorkerThread is being used for the audio render loop. Based on other audio engines I've seen and worked on, the audio render loop is a single threaded process with a possible thread or two for things like MIDI events. This, however, does not seem to be the case with LMMS. Can someone explain this to me?

PaulBatchelor commented 7 years ago

Just a heads up:

In an effort to bring the LMMS audio engine to more professional standards, I plan on removing the MixerWorkerThread class. It's bad practice to have multiple threads handling audio processing at this level.

BaraMGB commented 7 years ago

It would be very welcome if you document your work on this. I'm very interested in learning more about this.

PaulBatchelor commented 7 years ago

It would be very welcome if you document your work on this. I'm very interested in learning more about this.

@BaraMGB Would be more than happy to document how the internal audio engine works as I learn moer about it (assuming that's what you meant). Is the wiki where I would document such things?

tresf commented 7 years ago

Is the wiki where I would document such things?

https://github.com/LMMS/lmms/wiki

I don't think you'll have access though, so I've sent you an invite to the developers groups on GitHub. 👍

tresf commented 7 years ago

Side note, if we decide something like Doxygen is more useful, we can always entertain that over hand-writing the wiki, just let us know what format works best. We don't have a whole lot of documentation surrounding the code yet.

fundamental commented 7 years ago

I'd be cautious about removing parallel audio computation. I agree that it is very difficult to get right, but if you remove it at this point there is a very strong chance that users are going to be quite upset that LMMS is no longer as fast as it currently appears to be.

softrabbit commented 7 years ago

Based on other audio engines I've seen and worked on, the audio render loop is a single threaded process with a possible thread or two for things like MIDI events.

I'd hate to have most of my CPU cores sit idle while one of them struggles to render audio on time. Plugins producing a period of audio looks quite a bit like an embarrassingly parallel problem, and the processing of audio in the mixer channels isn't far off that description, either.

That being said, the current "one thread renders one note" (IIRC) architecture has seemed a bit like overkill to me ever since I first had it explained to me. Granted, this only involves some LMMS instruments, the ones that communicate through MIDI would be more like "one thread renders one instrument".

BaraMGB commented 7 years ago

@PaulBatchelor

Perhaps you want to read this https://github.com/LMMS/lmms/issues/628#issuecomment-41058071

jasp00 commented 7 years ago

@PaulBatchelor, please do not get upset. We all are learning and I myself propose structural changes. This is not because of you, I feel like this when reading other issues.

In an effort to bring the LMMS audio engine to more professional standards, I plan on removing the MixerWorkerThread class. It's bad practice to have multiple threads handling audio processing at this level.

I am worried when I read such comments and the fact that I reply weeks/months later after the commit. You want to drop parallel computing, I say no way. That would be going backward. Multiple threads handling individual notes may be uncommon but they are not bad practice at all.

An effort to bring the LMMS audio engine to more professional standards would be to document the current behavior for newcomers since the code does not seem obvious. This is what we lack: documentation and some statistics.

PaulBatchelor commented 7 years ago

@jasp00 Seeing reactions like yours to things like this worry me as an audio developer.

Threads don't work well with audio threads. I couldn't tell you why, but they don't. No one does this, and there's probably a good reason for this.

We all are learning and I myself propose structural changes.

You are absolutely right. I've looked at many audio engines, and none use threads as liberally as LMMS (and Qthreads of all things). It's strange, and that choice should be defended.

For reference, here are a list of open source audio engines/code to look at:

So maybe the threads are harmless, maybe not. LMMS has a ton of race conditions and the rendering causes glitches. Audio show stopper. Here is an article talking about audio developer mistakes. All/most are threading mistakes. Maybe LMMS audio devs have thought about this. By the way, the author of that article mentions a few iOS audio engines and tests them out. Most of these pro-engines fail. One of them that doesn't is AudioKit. For what it's worth, I wrote a large portion of that engine.

So yes, this is why I am questioning it. But seeing how hostile everyone has been (myself included in this), I really don't care much for defending this.

fundamental commented 7 years ago

To briefly chime in again, LMMS has a deeply flawed architecture and implementation from the standpoint of delivering reliable low latency audio. The threading model is a component of it and as @PaulBatchelor has pointed out LMMS has had quite a few issues due to the current threading model. The existing model has very little (developer) documentation and few guidelines to reduce various classes of bugs. I don't think that the current thread-per-note approach is a good design.

LMMS has offered multi-threaded performance, which users will continue to expect however, so eliminating it completely would likely result in backlash. Additionally multi-threaded audio generation does happen relatively frequently in very complex synths or in DAWs (take jack2's parallel client execution as an example). I do think that LMMS should have it's architecture corrected, but IMHO it ought to be done in a way that doesn't introduce performance and functionality regressions.

jasp00 commented 7 years ago

But seeing how hostile everyone has been [...] I really don't care much for defending this.

I do not see hostility, we are merely discussing. If you are right, when you defend your case you will prove the advantages. But you should be the one defending your proposal rather than me defending the current implementation.

Threads don't work well with audio threads.

You should not assume that is true. Even you mention Supercollider as an example of parallel processing. You know that threads may be harmless.

The reason people fail to see the advantages of parallel computing (which does not worsen audio latency) is because of its main disadvantage: parallel programming is hard to master. Sure, LMMS has race conditions but this is because developers are not used to parallel models. When done right, a parallel implementation beats the serial one; audio is no exception.

In the article you link about four common mistakes in audio development, at least three rules are not absolute truths. On an audio thread, you may hold locks, allocate memory, and do file or network IO if they are real-time operations.

Eventually, developers will get used to parallel programming and fix bugs. The question is, @PaulBatchelor, are you sure your single-threaded proposal will be faster than the current implementation?

fundamental commented 7 years ago

In the article you link about four common mistakes in audio development, at least three rules are not absolute truths. On an audio thread, you may hold locks, allocate memory, and do file or network IO if they are real-time operations.

There are very few absolute truths, but in terms of low latency computation on general purpose systems, these guidelines are very close to absolute. For reliable execution of a computation which has a tight time requirement, a program CANNOT call any function which may block (or execute) for a period of time greater than the available time left to perform the execution. Any blocking operation, which includes acquiring locks, allocating/deallocating memory from the libc heap, and IO can block for a longer period than can be tolerated. In very rare circumstances you may be able to bound the execution time, but those are far and few between and typically require knowledge of the whole program's state along with implementation details which are platform specific.

To come up with an example, consider a user making a long relatively freeform composition. They want to make an hour long recording, they're using roughly 80% of the DSP time (using the qjackctl definition), and they're using an audio frame size of 128 samples. Each audio frame calculation therefore is 2.6 ms with 0.53 ms of that time spent idle. To run their full performance the program will need to evaluate (48000/128*60*60) 1,350,000 frames of audio.

Let's say that as developers we want there to be a 90% chance of them getting through the full set without an xrun. In terms of probability:

Pr(no-set-xrun) = Pr(no-frame-xrun)^number-of-frames 0.9 = x^1,350,000 Pr(no-frame-xrun) = exp(ln(0.9)/1,350,000) Pr(frame-xrun) = (1 - exp(ln(0.9)/1,350,000)) Pr(frame-xrun) = 0.0000000780448235

Every time you're running an operation which could block, you're rolling the dice. Even with one unsafe lock or one unsafe memory allocation, are you 99.9999922% sure that it is going to finish executing in that period of 0.53ms. What about 10 unsafe operations? or 100 within the generation of a single frame of audio?

If a program is desired to be low latency then both careful analysis (which should be documented) is needed along with guidelines to remove unsafe system calls. Threads complicate this, though they are excellent for throughput. Without the use of blocking operations threads do become much more complex and unwieldy, which makes it easy for individuals to mistakenly degrade the low latency performance of a complex application.

Eventually, developers will get used to parallel programming and fix bugs.

Many open source projects tend to get a sizable amount of contributions from programmers who are relatively new to the domain. Without external guidelines these developers will likely be in a stage where they are not familiar enough to the constraints of parallel programming and/or realtime computation (in the context of audio). So, I'm not sure if this claim is true in the context of projects like LMMS.

michaelgregorius commented 7 years ago

Sure, LMMS has race conditions but this is because developers are not used to parallel models.

In an ideal implementation most developers would not even need to know how to write concurrent code. There shouldn't be many places in a DAW where concurrent code is executed. If you want to process different audio channels concurrently then that code should be as isolated as possible, have a small footprint and be well tested. Ideally it should also not be necessary to touch that code very often.

On an audio thread, you may hold locks, allocate memory, and do file or network IO if they are real-time operations.

Windows and Linux are not real-time OSes so none of the operations mentioned above are real-time safe and hence should not be done in the audio thread.

Here is another interesting article on that topic (it's also linked by the article mentioned above): http://www.rossbencina.com/code/real-time-audio-programming-101-time-waits-for-nothing

If LMMS really wants to compete with the well known commercial DAWs then it must not glitch. The main problem with LMMS is that it has a shaky core and does some things in very unorthodox ways. Having instruments with an unbounded number of voices for example leads to problems because it's not possible to implement them without resorting to dynamic memory allocation at a certain point which is a no go.

jasp00 commented 7 years ago

So, we agree that non-real-time operations are not suitable for real-time applications and that parallel programming is harder to understand.

Following your example, two processors would roughly halve DSP time (40%), freeing 1.3 ms for whatever locking is necessary. If locks are kept to a minimum, whether easy or not, you have an improvement.

One fact that perhaps is not obvious: LMMS mixer is single-threaded on a single processor.

So, I'm not sure if this claim is true in the context of projects like LMMS.

Is LMMS unable to have the fastest mixer because of new contributors? Then we write maintenance tools and external guidelines. Did I say we lack documentation?

jasp00 commented 7 years ago

In an ideal implementation most developers would not even need to know how to write concurrent code.

Plug-in developers would not need to know.

Linux are not real-time OSes

Then what is this? Why cannot LMMS be the best option for a real-time OS?

If LMMS really wants to compete with the well known commercial DAWs then it must not glitch.

Of course.

jasp00 commented 7 years ago

From https://github.com/LMMS/lmms/issues/3447#issuecomment-288209883,

There's really no trade-off between stability and performance.

Exactly, multithreading can be stable.

I'm very interested in a single-threaded version. If it's not too complex to implement

Replace this line with

    m_numWorkers( 0 ),
fundamental commented 7 years ago

@PaulBatchelor After getting some additional context from the discord chat (as referenced in the developer retention thread). I guess an introduction is in order, as I may have come off as an armchair scientist with regards to claims made within this (and other) thread(s).

I'm 'fundamental' and I've been involved in linux audio since 2009 when I started to maintain the ZynAddSubFX synthesizer as a hobby. At that point the zyn codebase had enough bitrot that it barely compiled anymore and some initial patches are how I ended up maintaining that project. The original architecture for Zyn had some pretty dreadful bits which caused numerous realtime hazards (e.g. a mutex around all of the audio execution and numerous malloc/free calls).

Since 2009 in the context of zyn I (with the assistance of contributing developers and users):

During this time I've also worked professionally as a DSP researcher. Most of my comments about audio threading are from painfully learning it the hard way in the context of audio software, both in zyn and in other projects in the greater linux-audio realm.

Over the years I've watched a variety of changes to the LMMS architecture and I think plenty of them are nuts, but it's difficult to talk about things without diving down into the tiny specifics. If people can talk about fine grained specifics then I can likely offer advise, but talking in generalities is difficult with regards to the labyrinthine flow of execution and data that LMMS has (not to mention how tricky it is to specify threading discussions or general realtime issues).