Closed WeirdConstructor closed 3 years ago
You're bringing up a well-known problem we already had some fierce discussion about (link to thread). However, you're bringing up a solution I haven't thought about and which is actually quite obvious: Copying one side into a buffer. I guess we haven't thought about it because some of us (including me) have followed a "only hard-real-time"-dogma.
But looking at the code, this is very easy to implement: Create two PortType
s called ClonedAudio
and CloneCV
, which are duplicates of the port types Audio
and CV
apart from the fact that they use type InputPortType = Vec<f32>
instead of &'static [f32]
in their implementation of PortType
. The implementation of input_from_raw
has to be changed accordingly, but that's it.
However, this should also be documented properly, because these new port types would require the user to a) resolve the conflict "host support vs. speed/real-time capabilites" for their own application and b) add the appropriate features in their Turtle description. In other words: Plugins aren't able to be hard-RT capable and support in-place buffers at the same time and the user has to know that.
I like the third option.
I think plugins can have a non-broken inplace processing and be real-time by requiring a bounded block length. After querying the host information and reading the maximum block length, the plugin can initialize the Vec
with the required capacity. When copying the data, the plugin can clear
the Vec
and fill it again; this causes no allocation, re-allocation or de-allocation as long as the capacity of the vector is not exceeded.
Note: I have (still) no experience with LV2 at all, so this is just a theoretical idea and some internet search.
I think there is a fourth option (which I'm listing for completeness mostly since it has some obvious disadvantages): splitting the ports into input ports and output ports and the run
method from the Plugin
trait into a separate run_input
method, which only gets to see the input ports and a run_output
method, which only gets to see the output ports. In this way, we can guarantee that the no input port is read after any output port has been touched.
I think plugins can have a non-broken inplace processing and be real-time by requiring a bounded block length. After querying the host information and reading the maximum block length, the plugin can initialize the
Vec
with the required capacity. When copying the data, the plugin canclear
theVec
and fill it again; this causes no allocation, re-allocation or de-allocation as long as the capacity of the vector is not exceeded.
Yes, that's a good idea. However, there is no standardized way (as far as I know) to request a fixed block length. Nevertheless, allocating the vector once and updating it will definitely improve performance and make the port soft-RT capable (the runtime behavior is deterministic most of the time since memory allocations/system calls only occur occasionally). However, this would require some slight refactoring since the port is reconstructed every single time. The PortType
trait would therefore require a create_port
method and an update_port
method.
I think there is a fourth option (which I'm listing for completeness mostly since it has some obvious disadvantages): splitting the ports into input ports and output ports and the
run
method from thePlugin
trait into a separaterun_input
method, which only gets to see the input ports and arun_output
method, which only gets to see the output ports. In this way, we can guarantee that the no input port is read after any output port has been touched.
It might work, yes, but it would break the plugin API again, which I would like to avoid.
For me as plugin writer it's just about getting the computer making noises. I'm not interested in that high RT performance, so having it work at all would be worth it. I wrote my plugin with rust-vst for now, but I put an abstraction layer in-between so I can switch to rust-lv2.
Real-time computing isn't about speed or performance. It's about deliving certain events before a set deadline. In the context of audio rendering, this means that a period (chunk of audio samples) is always executed before it is played. When the plugin tries to allocate memory, it hands over control to the operation system and it might decide not to continue the audio processing until the set deadline has passed. In a hard RT environment, this must not happen, never!
LV2 itself was designed to support hard RT environments. We've decided to do that too, which influenced a lot of our design choices, and we would like to continue to do so.
I've heard of some initiatives for cross-standard abstraction layers at our Discourse, maybe you should check them out!
Ok, did not know that this is already hard RT. Of course allocations and other syscalls are out in tight timing constraints. But I thought that most hosts probably don't use variable sized buffers in the real world - at least that would be my hope. So after an initial allocation everything should be fine.
However, there is no standardized way (as far as I know) to request a fixed block length.
I think there is a way to request a fixed block length, but it's not supported by Ardour, so it's not going to help.
What I'm referring to is to request a bounded block length, it's described in the buf-size extension of LV2.
Another idea is the following. When the buffers received are longer than the capacity of the Vec
in ClonedAudio
or ClonedCv
, to split them into parts that are small enough and to call run
multiple times, once for each part. The upside of this approach is that it doesn't require the host to support bounded block length, the downside that I see is that it may affect other things as well, e.g. you need to split sequences of midi events.
Calling run() two (or more) times also means that the plugin needs more information to properly interpolate parameter changes.
I mean: If the DAW calls the LV2 plugin with 390 samples to compute and sets the (possibly automated) parameters
to the values at the end of that sample period (Ardour does that AFAIK). If the plugin's run()
is called
4 times with 128 or less samples to compute, then it will receive constant parameter values for each run()
call. It needs to know the actual length of samples and the offset then to properly linearly interpolate
the parameter values.
I think maxBlockLength can be used here. Its the "maximum block length the host will ever request the plugin to process at once". So when the plugin is instantiated we can ask the host for the maximum block length and then allocate the buffer.
That's the way I'm doing it to allocate buffers to talk to the UI in Envolvigo BTW (see here), however the branch to retrieve an LV2_Option as an LV2_Feature from the host is not yet merged.
@PieterPenninckx wrote:
Another idea is the following. When the buffers received are longer than the capacity of the Vec in ClonedAudio or ClonedCv, to split them into parts that are small enough and to call run multiple times, once for each part. The upside of this approach is that it doesn't require the host to support bounded block length, the downside that I see is that it may affect other things as well, e.g. you need to split sequences of midi events.
You've already said it: If we would follow this idea, we would need to time-split Atom Sequences, containers for other atoms with time stamps. However, since events in a sequence may not be in order, this would require us to copy chunks of unknown size from one place to another. This can only be done properly by allocating new memory, which is exactly the thing we are trying to avoid. Therefore, this idea sadly doesn't work out.
Also, I wouldn't use any LV2 extensions in general to manage ports in the lv2-core
crate. It is supposed to be the bare minimum for any plugin and plugins that only use this crate should be able to run even with the most minimalist hosts. Sure, these extensions are very useful for plugin authors and hosts should support them if possible, but we shouldn't require every plugin and therefore every host to use/support them.
All in all, I still think that the most elegant solution is a Vec<f32>
that stores the input frames and is re-used. True hard-RT environments are rare and I think we can safely accept the "risks" of not supporting hard-RT and in-place work at the same time.
Therefore, this idea sadly doesn't work out.
Yeah, unfortunately. I was hoping somebody would find a way around these problems, but it's more the other way around: the more I think about it, the more problems pop up.
I wouldn't use any LV2 extensions in general to manage ports in the
lv2-core
crate.
I hadn't thought of that, honestly, but you're right.
All in all, I still think that the most elegant solution is a
Vec<f32>
that stores the input frames and is re-used.
Yeah, I can't think of anything else. When (and if) we get to the buf-size
, we can still add in that crate a ClonedAudioCvWithSufficientCapacity
or something that queries the host for the maximum buffer size to instantiate the Vec
with sufficient capacity.
Seems i missed an interesting discussion here.
Nobody considered to allow some unsafeness for plugin developers ? In practice, it just mean giving access to the ports as raw pointers. And as long those pointers aren't casted to reference, we (developpers included) don't have the rust undefined behavior issue of having '&' and '&mut' reference to the same data.
I know it's not ideal, since we defer to developer the responsibility of not overwriting input data before using it because of pointer aliasing. But i also think it's not easy to design an API guarantying this while being convenient and efficient for most use case and we may wait a long time before having it.
Nobody considered to allow some unsafeness for plugin developers ?
Yes, that's an additional option. I think we can add both the ClonedAudio
and ClonedCV
and an unsafe option, so that the plugin developer can choose.
Ok, I tried to implement this and I think it's possible to do what @Janonard proposed, but my improvement of re-using the same memory does not seem to be possible. Let me explain:
impl PortType for ClonedAudio {
type InputPortType = Vec<f32>;
type OutputPortType = &'static mut [f32];
#[inline]
unsafe fn input_from_raw(pointer: NonNull<c_void>, sample_count: u32) -> Self::InputPortType {
unimplemented!();
}
}
So if I get it correctly, this was the proposal: use Vec<f32>
as the new InputPortType, but keep the OutputPortType
. However, this would imply that Self::InputPortType
equals Vec<f32>
, which means that input_from_raw
needs to create a new Vec
every time. We cannot change InputPortType
into &'static Vec<f32>
(or simply &'static [f32])
) since the Vec
may re-allocate. We only know that it will not re-allocate when we know maxBlockLength
. We also cannot use another lifetime since that lifetime needs to come from somewhere.
I have to think about this some more since in the end, we will want to support both in place processing and real-time processing.
I did an implementation in #93 that I believe addresses the issues discussed here. I would love to get some feedback on it before merging however! :slightly_smiling_face:
I used the amp example from the docs and discovered that Ardour 5 to 6.3 do not support the
inPlaceBroken
feature.After loading Ardour I get this warning and Ardour just ignores the existence of the plugin.
Tested on Linux amd64 platform. The plugin works with Carla, and carla.lv2 can be used from within Ardour so it is still kind of usable. But it's a bit sad I would have to go via Carla.
I understand the rationale from the rust-lv2 user guide for not supporting it, but I would gladly accept some unsafe code in the processing function itself.
Otherwise I would have to look into writing the DSP code in Rust without rust-lv2 and try to use it from a C wrapper library.
So I would suggest three options to deal with this:
InputPort<AudioCopy>
port type - but I am not too deep into the internals of rust-lv2 and the unsafe side of Rust programming.