OpenLightingProject / ola

The Open Lighting Architecture - The Travel Adaptor for the Lighting Industry
https://www.openlighting.org/ola/
Other
647 stars 206 forks source link

RPI ola_recorder, playback is not consistent to time! #1677

Closed magdesign closed 4 years ago

magdesign commented 4 years ago

Having a Rpi 3B+ with the full OLA install. I can record and playback DMX /ArtNet sequences.

But we figured out that the timing is not constant, the playback speed varies a lot. We tried to optimize some settings on the RPI, enable Turbo mode, disable other services to free up resources, with no luck. Tried with USB-DMX dongle and just Art-Net over the integrated RJ45.

Anyone else having the same problems ? Any tricks to get consistent playback speeds ? Any help very very appreciated , will even donate for fixes!

MagnusHerrlin commented 4 years ago

Hi

We had a discussion about this in Google groups in 2017 https://groups.google.com/g/open-lighting/c/H53-bSavPJc/m/8wPmKKKNBAAJ I came up with a small program to get it "working" by adding/removing milliseconds to the recorded file and to trim the start time of the recording, It made it better but didnt really solve the problem. I still don't understand why this is happening and if someone can explain it to me then I would be very happy.

magdesign commented 4 years ago

@MagnusHerrlin: thanks for your response! The manual add/remove milliseconds solution is a bit confusing, imagine I have recorded 4 universes and want to fit the timing with this shifting solution... this would take hours :(

@DragoonBoots: did your latest commits address the timing issues as well ?

magdesign commented 4 years ago

@MagnusHerrlin: We now test what happens if we playback recorded files shorter than 20 seconds, if this somehow is more precise, we would need to split every record track into pieces. Currently waiting for feedback from Greg, who is doing the tests on the field :-)

danielskeenan commented 4 years ago

Hmmm. The actual playback timing is controlled from inside the OLA daemon; the only thing I can imagine the PR affecting is each piece of the show file taking a few more CPU cycles (because of the added logic around custom timing). Let me dig up my RPi and see if I can reproduce.

MagnusHerrlin commented 4 years ago

Hi

From what I can recall the lights got out of sync after about 3 minutes. I only recorded one universe, maybe the syncproblem gets even worse when you record/playback more universes But I dont understand why you think it would take hours, normally I only have to run this one or two times to get it in sync. But I agree my solution is not great it is more of last a desperate hack to get it to "work", at least for the length of recordings we did back in 2017. And I totally agree that it would be much nicer to have the recording/playback sync with some master clock perhaps MIDI clock or SMPTE. How does ola_timecode work? can that be included somehow into ola_recorder?

Thanks /Magnus

danielskeenan commented 4 years ago

I just did some tests comparing performance on my Raspberry Pi 3B and my desktop PC. The setup for these tests was to run a 5 minute show file that changes address 1/1 between 255 and 0 every 1000ms. The output from olad was captured and logged in sACNView as this provides a timestamp for each frame received. I then did some math with the timestamps to find the average time between value changes. Here's the results (also comparing ola_recorder revisions):

Platform Average diff (seconds)
Linux x64 (new) 1.000444
Linux x64 (old) 1.000653
RPi 3b (new) 1.000737
RPi 3b (old) 1.000693

On both platforms, the time between value changes (i.e. when 255 becomes 0) is veeeeeery slightly off from 1 second, so I could see how the extra time could compound enough to be noticeable over several hours, but not over several minutes.

With these error rates (gaining 0.5 milliseconds over 5 minutes) I'm inclined to think this may not be fixable if this is a similar scale to the error's your seeing. The program doesn't (and, to my knowledge, can't) run in real time. The gained time is likely from processing each line in the file (i.e. time between frames = delay time + processing time), but that can't reliably be compensated for.

Having said that, if your errors are at a much greater scale, it may be a platform issue. I ran the tests with Raspbian lite downloaded this morning, so didn't have any GUI or extras running. I also have my RPi in a fan case - I can't imagine ola would place such a heavy load on the CPU that it would start thermal throttling, but it's a possibility.

Re ola_timecode (more experienced contributors - please correct me if this is wrong!), it doesn't appear to interact directly with ola and only exists to forward timecode data to an ArtNet output.

MagnusHerrlin commented 4 years ago

With these error rates (gaining 0.5 milliseconds over 5 minutes) I'm inclined to think this may not be fixable if this is a similar scale to the error's your seeing. The program doesn't (and, to my knowledge, can't) run in real time. The gained time is likely from processing each line in the file (i.e. time between frames = delay time + processing time), but that can't reliably be compensated for.

In 2017 I was getting more error drift than that (but that was on Jessie perhaps it has changed on Buster) I made a showfile turning a channel on for 500ms and off for 500ms (blue trace) every second and played a beep (red trace) every second

1m40s

And it took about 1,5 seconds to get 500ms off.

So how does ola_recorder really work?

Is it like I think?

1 open the file check that first line is OLA Show 2 read universe number and all recorded channels comma separated like 0 0,0,0,0....etc well how long does this take? 3 read time in milliseconds to wait 4 what happens here? I guess the file is still open?

Does it give control to the system that sets the ola_recorder process in some sort of wait state? and when the wait time has passed sets it back to active state that makes it jump to step 2.

If it takes too long to read a line of dmxdata and convert it from decimal to a array of unsigned 8bit bytes why not read the whole file or a big chunk of it into a bigger buffer and playback it from there?

Then I guess it would be better to have the data in ram so you can manipulate it faster It would be nice if the program could take input while playing to start/stop/rec/pause/rev/ff like a audio/video player.

But can the file read speed really be the case? I mean how does a audio or video player program work? most certainly they must be reading data at a much higher rate?

The ola_recorder sure have some problems but also a lot of potential.

But what if i would like to change the speed while playing back the file? or add some kind of master to some or all the channels? or play it backwards? or playback several files at once? or only change some of the channels but leave the rest to whatever value they have?

This last question i think have a answer to. Currently when ola_recorder reads a line that looks like this 0 255,,255 sets universe 0 channel 1 and 3 to 255 but treats the missing number as between the commas as 0 setting channel 2 to zero, Well I think it should not it should not change channel two at all. so if you have a line like this 0 ,,255,255 in a future version of ola_recorder would set universe 0 channel 2 and 3 to full but leave channel to whatever value it has. This would make it possible to mute certain channels from the recording.

Say you have a string of rgb leds, dimmers, moving heads or whatever on one universe and you only want to record/change some channels or groups of channels, for example change every third led to red or change 4 moving heads to blue well you can't do that with ola_recorder as it is now.

/Magnus

peternewman commented 4 years ago

There's a tiny chance #1673 may help improve things if your machine's clock kept getting updated (e.g. via NTP).

But I dont understand why you think it would take hours, normally I only have to run this one or two times to get it in sync.

Do you mean two iterations of a three minute showfile, or something else?

Re ola_timecode (more experienced contributors - please correct me if this is wrong!), it doesn't appear to interact directly with ola and only exists to forward timecode data to an ArtNet output.

Yeah as @DragoonBoots says, ola_timecode simply sends a value out to any plugins which support timecode (which is currently just Art-Net), it doesn't do any processing of that data.

Even if ola_recorder supported Art-Net timecode, unless it only sends out data based on those timecode values (and doesn't free-run; which would have it's own issues with loss of timecode data) then we'd still have this issue between timecode frames.

I just did some tests comparing performance on my Raspberry Pi 3B and my desktop PC. The setup for these tests was to run a 5 minute show file that changes address 1/1 between 255 and 0 every 1000ms. The output from olad was captured and logged in sACNView as this provides a timestamp for each frame received. I then did some math with the timestamps to find the average time between value changes.

Out of interest @DragoonBoots , what were your min and max differences? Care to plot a graph like I did in the linked issue to see if the differences are constant or cumulative

In 2017 I was getting more error drift than that (but that was on Jessie perhaps it has changed on Buster) I made a showfile turning a channel on for 500ms and off for 500ms (blue trace) every second and played a beep (red trace) every second

Personally I'm tempted to trust @DragoonBoots figures more, there's not extra load on the Pi due to audio playback, but more importantly it's not another variable. sACNView will be simply logging each packet as it receives it, there's no risk compared to the audio of it's playback varying (unless you also measured that independently e.g. with your scope to confirm the playback frequency was accurate).

With these error rates (gaining 0.5 milliseconds over 5 minutes) I'm inclined to think this may not be fixable if this is a similar scale to the error's your seeing. The program doesn't (and, to my knowledge, can't) run in real time. The gained time is likely from processing each line in the file (i.e. time between frames = delay time + processing time), but that can't reliably be compensated for.

So how does ola_recorder really work?

1 open the file check that first line is OLA Show 2 read universe number and all recorded channels comma separated like 0 0,0,0,0....etc well how long does this take? 3 read time in milliseconds to wait 4 what happens here? I guess the file is still open?

Actually I think it could potentially be @DragoonBoots (as mentioned in the Google Group discussion).

As you roughly surmised, we currently do this:

  1. open the file check that first line is OLA Show
  2. read universe number and all recorded channels comma separated like 0 0,0,0,0....etc well how long does this take?
  3. read time in milliseconds to wait
  4. wait for that time
  5. goto 2

This does mean that the small delays in running and processing 2, 3 and 4 do build up. If we instead did this:

  1. open the file check that first line is OLA Show 1a. record current timestamp
  2. read universe number and all recorded channels comma separated like 0 0,0,0,0....etc well how long does this take?
  3. read time in milliseconds to wait
  4. wait until recorded timestamp + that time or wait for (time to wait - (now - recorded timestamp))
  5. goto 1a

Then we'd take into account the processing delay and wouldn't introduce as much of a cumulative error. Obviously there would still be a small amount we can't handle because we do various other processing.

We could easily test if we'd fixed this by adding a deliberate delay in as 3a as a test, in the current model, each frame would be that delay late. Improved code shouldn't vary at all (as long as the intended wait time is more than the deliberate delay).

If it takes too long to read a line of dmxdata and convert it from decimal to a array of unsigned 8bit bytes why not read the whole file or a big chunk of it into a bigger buffer and playback it from there?

Because then you get into all sorts of memory allocation issues. When reviewing @DragoonBoots recent changes we settled on handling a showfile of a month long (just in case); the limit is actually a lot more than that, but if we dropped down to a uint32 it would be less than a month and I figured potentially someone may run monthly shows so they could have special Christmas ones. Realistically I suspect daily files is the most that most people would do, but even with one universe that's 512 bytes 40 frames 60 seconds 60 minutes 24 hours = just over 1.75GB of data for one universe for one day. It also makes all the code a lot, lot more complicated; currently it reads a line (or a few lines at most), processes them and then moves onto the next line.

It would be nice if the program could take input while playing to start/stop/rec/pause/rev/ff like a audio/video player.

That's certainly technically possible; patches welcome. Start by considering how to do the remote control (OSC/HTTP/OLA API/DMX/something else) then design a protocol then implement the code to do it, then deal with all the edge cases. As @DragoonBoots is all too aware we came up with plenty just adding the ability to start playing from any point within a file and stopping at any other point, without changing direction or speed.

I mean how does a audio or video player program work? most certainly they must be reading data at a much higher rate?

Yes, a lot of them use a buffer so there is a delay between what's read off the disk and what appears onscreen. If you take something like and MP2/MP4 you need to read quite a bit of surrounding data to be able to assemble the first frame you want to watch because of the compression in use.

But what if i would like to change the speed while playing back the file? or play it backwards?

These are a repetition of your question above.

or playback several files at once?

You can start multiple ola_recorders pointing at the same or different universes already.

or add some kind of master to some or all the channels? or only change some of the channels but leave the rest to whatever value they have?

These are better handled in patcher v2 #280 (or as a client for now), as they aren't specific to recording, they are processing of DMX data which could be useful as a filter in many scenarios.

This last question i think have a answer to. Currently when ola_recorder reads a line that looks like this 0 255,,255 sets universe 0 channel 1 and 3 to 255 but treats the missing number as between the commas as 0 setting channel 2 to zero, Well I think it should not it should not change channel two at all. so if you have a line like this 0 ,,255,255 in a future version of ola_recorder would set universe 0 channel 2 and 3 to full but leave channel to whatever value it has. This would make it possible to mute certain channels from the recording.

ola_recorder doesn't know about the current value though, it's just sending DMX data out which is being merged by the universe and output via any plugins patched to that universe. Again this would need to be done in patcher v2.

More broadly our DMXBuffer doesn't have the concept of a channel with no value (as it's just an array of bytes essentially) so you'd need to extend that or send the mask separately. Think of a transparent PNG, there is R, G, B and a separate alpha channel. DMX doesn't have that idea as such.

Say you have a string of rgb leds, dimmers, moving heads or whatever on one universe and you only want to record/change some channels or groups of channels, for example change every third led to red or change 4 moving heads to blue well you can't do that with ola_recorder as it is now.

You can only do HTP and LTP merges via all of OLA currently, that's not specific to ola_recorder.

magdesign commented 4 years ago

If I get you right, would it make sense to convert the human readable playback file to dmx binary for faster processing ?

Would it help to send the data from a timeline addon, like ofxTimeLine, or is the bottleneck in the plugins processing the data ?

How did you make this nice test-drift graphs ? Still not sure if they help to find a regularity pattern...

magdesign commented 4 years ago

@vanvught said in the issue on his dmx project that this is an issue caused by Linux kernel (?) https://github.com/vanvught/h3dmx512-zip/issues/10#issuecomment-719529459

If this is true, would a realtime-kernel help on RPI? I am currently compiling a realtime kernel to test, also saw in this thread reserving some cores only for the ola_record might help.

Since I do not have art-net and dmx hardware here, I need to wait for people to send me files.... For this I would appreciate to get one of the dmx files and the audio from @MagnusHerrlin and maybe a tip on how to make a graph as you posted.

I just can not agree that we shall life with this timing behavior :-)

shenghaoyang commented 4 years ago

open the file check that first line is OLA Show 1a. record current timestamp read universe number and all recorded channels comma separated like 0 0,0,0,0....etc well how long does this take? read time in milliseconds to wait wait until recorded timestamp + that time or wait for (time to wait - (now - recorded timestamp)) goto 1a

I think this is quite feasible - used to do this and extend it to something like a simple feedback loop that additionally tries to predict the delay so it gets more accurate.

With this scheme at most you'd get some delay between when the frame should start versus when it actually starts, and there shouldn't be growing shifts of the actual start time and when it should start.

Trying to reduce the (should be rather small relative to the frame duration) processing time could be a next step if that doesn't seem to help (gonna have to profile OLA recorder).

(Of course this is all in the recorder application, whatever happens in the server core is .....?)

peternewman commented 4 years ago

If I get you right, would it make sense to convert the human readable playback file to dmx binary for faster processing ?

No, well not in terms of usability and utility. It might make it marginally faster, but lots of people are creating or modifying (or debugging) the plain text file data.

We've got ola_latency you could try:

./examples/ola_latency --send-dmx -c 100000
--------------
Sent 100000 RPCs
Max was 49867 microseconds
Mean 56 microseconds

Or this test (which includes a 1s sleep, a 20ms sleep and a lot of runs of reading and writing DMXBuffers:

time ./common/utils/UtilsTester 
.common/utils/ActionQueue.cpp:59: Action queue already finished!
.common/utils/ActionQueue.cpp:59: Action queue already finished!
.........................common/utils/DmxBuffer.cpp:241: Attempting to set channel 30 when length is 20
common/utils/DmxBuffer.cpp:241: Attempting to set channel 200 when length is 20
............................................

OK (71)

real    0m1.048s
user    0m0.029s
sys 0m0.003s

Would it help to send the data from a timeline addon, like ofxTimeLine, or is the bottleneck in the plugins processing the data ?

There will be a lot more overhead running a GUI to use that ofxTimeLine.

Different plugins have different amounts of effort to get the data out, the plugin being used may be part of the problem, which might explain why it doesn't show up with sACN. You could try recreating @DragoonBoots test with sACN which would help to rule things out.

How did you make this nice test-drift graphs ? Still not sure if they help to find a regularity pattern...

It's on the Google thread, it adds a timestamp to each line, then I dump those in a graph and do the difference between them. I generated a file with 500ms on/500ms off pulses (which I guess is quite slow and friendly for it).

I guess we could do with repeating these tests or the ones @DragoonBoots did with different frame period timings (e.g. 250ms, 100ms, 50ms say) and see what difference it makes.

If you consider another classic cumulative error scenario we've probably all coded, fading from A to B. If you do B-A/steps and get say 0.5 then for example round it up/store it in an int, every step if you add the pre-calculate value, you'd actually increase it by 1 when it should be 0.5 so your line would be twice as steep. If you accidentally had an offset instead (e.g. 1 + increment), you'd have a line at the same angle but lifted up/down relative to the expected path. Similarly with the timing they help to give an idea, is it a constant delay, variable or increasing.

If this is true, would a realtime-kernel help on RPI?

Possibly, but we do lots of more complicated bits on the Pi, and more so for others, it seems somewhat unlikely but I've never really tried working a Pi hard.

Since I do not have art-net and dmx hardware here, I need to wait for people to send me files....

I didn't use any hardware. I generated the source file by a basic script, but you could hand-craft or copy/paste (try doing that with a binary one).

Dan's test is just using the sACNView software, mine isn't even that clever just logging the time each line is output on the PC.

For this I would appreciate to get one of the dmx files and the audio from @MagnusHerrlin

As mentioned I think audio risks confusing things further as it's another variable to consider. We really just want to find out how consistent the output is. You could also record OLA driving an LED fixture via the plugin and then count frames on a video editor/player.

and maybe a tip on how to make a graph as you posted.

See my comment above.

I just can not agree that we shall life with this timing behavior :-)

Hopefully not.

I think this is quite feasible - used to do this and extend it to something like a simple feedback loop that additionally tries to predict the delay so it gets more accurate.

I'm not sure you need to do that. My suggestion is already taking into account the processing time, so it's not intended delay + processing time. I'd imagine the delay is likely to vary anyway due to load on the machine/background processes, so I'm not sure a feedback loop would be very accurate.

Trying to reduce the (should be rather small relative to the frame duration) processing time could be a next step if that doesn't seem to help (gonna have to profile OLA recorder).

That would be cool, although make sure you look at @DragoonBoots version in master.

(Of course this is all in the recorder application, whatever happens in the server core is .....?)

Yeah quite, the core is far more involved and complicated. Someone could replace ola_recorder as a test with something that uses the client API to send a frame, waits n seconds then sends another one. That would help to pinpoint which bit of software is causing it even if not why.

danielskeenan commented 4 years ago

Chart for science! chart I'm guessing the two spikes are from me having some other things going on in the background on my local machine - everything else is pretty on point (no pun intended).

Part of me is also wondering about I/O latency on the Pi - using a good SD card or USB drive could be key here.

shenghaoyang commented 4 years ago

Anyone with some time to test this RFC patch before a real PR?

It keeps track of the current "playhead/desired" time relative to the last seek and the actual elapsed time since the last seek and just tries to sleep to match those.

May have edge cases >:), applies onto master.

diff --git a/examples/ShowPlayer.cpp b/examples/ShowPlayer.cpp
index 5a57f497a..8c1def457 100644
--- a/examples/ShowPlayer.cpp
+++ b/examples/ShowPlayer.cpp
@@ -217,6 +217,8 @@ ShowLoader::State ShowPlayer::SeekTo(uint64_t seek_time) {
     }
   }
   m_playback_pos = playhead_time;
+  m_clock.CurrentTime(&m_start_ts);
+  m_start_playback_pos = m_playback_pos;

   // Send data in the state it would be in at the given time
   map<unsigned int, ShowEntry>::iterator entry_it;
@@ -270,8 +272,17 @@ void ShowPlayer::SendEntry(const ShowEntry &entry) {
   SendFrame(entry);
   m_playback_pos += entry.next_wait;

+  unsigned int timeout = entry.next_wait;
+  if (!m_simulate) {
+    uint64_t target_delta = m_playback_pos - m_start_playback_pos;
+    ola::TimeStamp now;
+    m_clock.CurrentTime(&now);
+    uint64_t delay = (target_delta - (now - m_start_ts).InMilliSeconds());
+    // Handle overflow
+    timeout = (delay >= (~((unsigned int)0))) ? 0 : delay;
+  }
   // Set when next to send data
-  RegisterNextTimeout(entry.next_wait);
+  RegisterNextTimeout(timeout);
 }

diff --git a/examples/ShowPlayer.h b/examples/ShowPlayer.h
index 0ae885dbf..dfa168bbe 100644
--- a/examples/ShowPlayer.h
+++ b/examples/ShowPlayer.h
@@ -18,6 +18,7 @@
  * Copyright (C) 2011 Simon Newton
  */

+#include <ola/Clock.h>
 #include <ola/DmxBuffer.h>
 #include <ola/client/ClientWrapper.h>

@@ -89,6 +90,9 @@ class ShowPlayer {
   uint64_t m_run_time;
   std::map<unsigned int, uint64_t> m_frame_count;
   bool m_simulate;
+  ola::Clock m_clock;
+  ola::TimeStamp m_start_ts;
+  uint64_t m_start_playback_pos;

   /** Used for tracking simulation progress */
   typedef enum {

Quick smoke tests with a ~30Hz input, 2 channel, for 330 seconds total X axis is elapsed real time (ms), Y axis is time when frame was supposed to be sent - time frame actually sent: Positive values means frame was early, negative value means frame was late.

Master @ ea3a18a925f123c8771ebc6c60327678e99ddcf2: image

After patch: image image

magdesign commented 4 years ago

Willing to test :-) After compiling master branch on RPI I still get 0.10.3 when doing olad --version , this is just a labeling issue! in the webinterface I get 0.10.7

sudo service olad start does not work since there is no olad service, but I am able to start directly in ola/olad/./olad

Here is my plan to optimize RPI: https://gist.github.com/magdesign/a7face5be864c947a3573ecc6bde858a

(If we are already talking about compiling, how can I create a .deb file after successfully building to update other RPI's too?)

peternewman commented 4 years ago

@shenghaoyang you should just push it up to a branch on your repo, even if you don't open a PR yet!

After compiling master branch on RPI I still get 0.10.3 when doing olad --version , this is just a labeling issue! in the webinterface I get 0.10.7

No, you're running two different bits of code. I'd suggest doing an apt-get remove ola then try again (although your notes suggest you've done a purge already). Try which olad too.

(If we are already talking about compiling, how can I create a .deb file after successfully building to update other RPI's too?)

See https://github.com/OpenLightingProject/ola/blob/master/README.debian

shenghaoyang commented 4 years ago

1680 is a draft PR if you want to pull a branch and build.

MagnusHerrlin commented 4 years ago

Hi

@shenghaoyang That is looking really promising. @magdesign Did you manage to compile and install it yet?

First I want to clarify a few things from my test in 2017 I was using a Rpi3 with the Native UART DMX Plugin The picture from the scope shows a audiobeep and the output of channel one from a 30ch dmx to led driver pcb.

@peternewman said:

Personally I'm tempted to trust @DragoonBoots figures more, there's not extra load on the Pi due to audio playback, but more importantly it's not another variable. sACNView will be simply logging each packet as it receives it, there's no risk compared to the audio of it's playback varying (unless you also measured that independently e.g. with your scope to confirm the playback frequency was accurate).

Isn't it more interesting to measure what you acually get out to your lights? after whatever plugin you are using (including its delay) , but this really does not seem to matter too much as I think (correct me if I'm wrong) @magdesign is using a different plugin? and the drift seems to happen regardless whatever plugin you use.

And really I think the output from ola_recorder should not vary with cpu load even if I played 3 soundfiles and 8 videofiles at the same time.

This was really adressed to @magdesign But I dont understand why you think it would take hours, normally I only have to run this one or two times to get it in sync.

@peternewman said:

ola_recorder doesn't know about the current value though, it's just sending DMX data out which is being merged by the universe and output via any plugins patched to that universe. Again this would need to be done in patcher v2

well ola_recorder does not have to know the current value of the output it should just change valid values to olads buffer and if the value if not valid meaning two ,, just step to next channel.

You can start multiple ola_recorders pointing at the same or different universes already.

Really but what happens to the output then? if one instance of ola_recorder is sending out 0 to a channel and a second instance sends out 255 to the same channel in the same universe I guess that channel will start to flicker

and asked:

Do you mean two iterations of a three minute showfile, or something else?

Yes. one or two iterations per showfile.

What I ended up doing was this:

  1. record the file
  2. remove one millisecond from every (waittime) line.
  3. if it is still lagging remove 1 millisecond from every 6th line. (to fine tune it)

I'm trying to understand how ola_recorder and olad works

Is this true?

olad is a kernel daemon process sitting in a loop reading and writing dmxdata from/to its plugins that is essentially an array of uint8's per universe.

So how does ola_recorder work? Does it have its own buffer that is filled with the data read and converted from decimal? and then this buffer is copied over to olads buffer (using memcpy?) or is it using a pointer to olads buffer and fills that buffer directly?

If it is the systemcall to read a line from a file that takes time then why not use two buffers instead?

What if instead of doing this:

1 open the file check that first line is OLA Show 2 read universe number and all recorded channels comma separated like 0 0,0,0,0....etc 3 read time in milliseconds to wait 4 set a timeout timer and wait

do this?

1 open the file check that first line is OLA Show 2 read universe number and all recorded channels comma separated like 0 0,0,0,0....etc 3 read time in milliseconds to wait 4 read next line and fill buffer 2 5 set a timeout timer and wait now instead of jumping to step 2 send buffer 2 (that is already read and converted from decimal) to olad and then read next line into buffer1

Doing like this would make sure the data is output as fast as possibly and the delay from the systemcall to read a line from file happens after the buffer is copied.

In a showfile with just one universe recorded this would be easy but a bit harder if the showfile contains multiple universes.

/Magnus

shenghaoyang commented 4 years ago

@MagnusHerrlin

olad is a kernel daemon process sitting in a loop reading and writing dmxdata from/to its plugins that is essentially an array of uint8's per universe.

olad is just a daemon running as a user process. It has no association with the kernel.

well ola_recorder does not have to know the current value of the output it should just change valid values to olads buffer and if the value if not valid meaning two ,, just step to next channel.

the current RPC interface (I think) does not allow access to the buffers. You can only send a buffer to OLAD, and hope it does the right thing :tm: with LTP / HTP merging. I don't think there is any special value to leave a channel unchanged.

See @peternewman comments:

More broadly our DMXBuffer doesn't have the concept of a channel with no value (as it's just an array of bytes essentially) so you'd need to extend that or send the mask separately. Think of a transparent PNG, there is R, G, B and a separate alpha channel. DMX doesn't have that idea as such.

So how does ola_recorder work? Does it have its own buffer that is filled with the data read and converted from decimal? and then this buffer is copied over to olads buffer (using memcpy?) or is it using a pointer to olads buffer and fills that buffer directly?

From what I see ola_recorder has its own buffers, and it sends them to the OLA daemon through a TCP RPC channel. There is no zero-copy mechanism.

Doing like this would make sure the data is output as fast as possibly and the delay from the systemcall to read a line from file happens after the buffer is copied.

I'd expect the kernel to already have significant portions of the file loaded into the page cache already, so the read process shouldn't take that long. Plus - the C++ library should be doing its own buffering, so unless there is an extremely high bandwidth input file (high rates / many universes), I/O for the most part would not contribute significantly to the processing time.

Buffering a frame in ola_recorder would work, but at that point it feels like ola_recorder is going to have to contain a full playback engine / pipeline...


We can definitely deal with the "lagging" output where this image sort of delay occurs - that's in the PR.

But if you want deterministic jitter - i.e. this graph with values all bounded by some limit: image OLA is probably not what you want to use.

You could reduce jitter by using RT scheduling, isolcpus and NOHZ_FULL, but at that stage you're giving up entire CPU cores. You might even get some more improvement by hacking OLA to never sleep in its event loop and spin continuously for events, but most users would never want this. (Don't forget that the server and recorder are two separate tasks, so that's two cores gone if you want to use this route. Integrating some kind of scheduled send in the server could be a solution, but that's new RPCs and changes to the core).

If you really want these hard guarantees the only thing that can offer them is some bare metal application, like the ones @magdesign was looking at with the Orange Pis.

From my POV by having OLA run as user processes you're getting the flexibility to multiplex lots of tasks on a single device (e.g. a Pi), but the tradeoff is that you have to live with some form of jitter - which you can reduce but never eliminate.

MagnusHerrlin commented 4 years ago

olad is just a daemon running as a user process. It has no association with the kernel.

From what I see ola_recorder has its own buffers, and it sends them to the OLA daemon through a TCP RPC channel. There is no zero-copy mechanism.

Thanks @shenghaoyang and @peternewman for explaining this.

Sorry but I am not so familiar with how Linux works (I normally program microcontrollers). I made this little dmxtester some years ago: Testare

I call the OneButtonMixer

Thanks /Magnus

magdesign commented 4 years ago

I compiled a version with the two changed files (examples/ShowPlayer.cpp | examples/ShowPlayer.h) from @shenghaoyang this pull request: https://github.com/OpenLightingProject/ola/pull/1680/files

Now I am still stuck in creating a *.deb file: https://github.com/OpenLightingProject/ola/issues/1679 (maybe disable unit tests??) to send this ola version to someone who will make the real-scenario test on an installation with several universes and then sends back a video-recording showing us how it behaves with this fix... :-) Testing in real conditions sounds great to me....

@Magnus Herrlin: we are using the ArtNet plugin. Your script to manual fix could be an option, but I have some people creating a show @home and sending it to the location via internet (welcome to the covid world), so its going the be hard to fix the lag remotely.

BTW. I think the Debian demon file should be updated. In newer Debian systems, services are stored in/etc/systemd/system/ so creating a /etc/systemd/system/olad.service file with this content would do the trick.

magdesign commented 4 years ago

Just to make something clear: The ola_recorder playback timing lag has nothing to do with the CPU or RAM power !!

I got a dmx show which is played via artnet together with an *.mp3 on a RPi3B+ and lags a lot, here a screenshot of the 4-core CPU usage:

cpu_rt_kernel_ola10_7

Its somwhere between 2% and 8%, so there seems to be plenty of headroom.

Tested this with ola 10.3 and 10.7 So its def. something in code. As soon as I am able to create or get a .deb file of the fix from @shenghaoyang I can send an update to get feedback from a realworld scenario. Thanks for any further help!

magdesign commented 4 years ago

Finally I was able to build and test https://github.com/OpenLightingProject/ola/pull/1680 from @shenghaoyang

The result is promising, in the beginning it stayed totally sync: begining

but after approx. 1min 30s it started to be faster than it should: faster

here is the dmx show , which changes every second the color. Its recorded with ola over art-net on the RPi.

My testsetup is a RPI3B+, recording the dmx show over art-net, then playing it back over usb-dmx dongle to a regular led par light and playing a video file with audio aside. During the whole playback with video and dmx, the cpu never goes above 30%

@shenghaoyang do you think you can tweak your math a bit more so it would stay longer in sync?!

magdesign commented 4 years ago

For those who can spend some time testing on their RPI (and maybe creating nice graphs to post here :-) The compiled *.deb files with https://github.com/OpenLightingProject/ola/pull/1680:

https://github.com/magdesign/ola_with_fix

sudo dpkg -i libola1_0.10.7-1_armhf.deb 
sudo dpkg -i ola_0.10.7-1_armhf.deb 
shenghaoyang commented 4 years ago

Uh, if you had a exact showfile it might stay roughly in sync.

OLA_recorder also has its own rounding when writing to the file, so that might also be a source of accumulating errors.

How does testshow.txt look? - That's your input with all pauses trimmed to exactly 500ms.

There was also a few commits yesterday that fixed an initial offset - I think you applied that though. (Else it'd be off by one frame right at the start.)

magdesign commented 4 years ago

Testing with the testshow.txt and this seems to stay in sync !!

Is it possible to correct the rounding on the recorder side (not every show is linear like the testfile :-)

We are getting there.....

shenghaoyang commented 4 years ago

The problem is that we'd need to figure out a new format for the showfile, either with higher resolution timestamps or an (relative to the start of the file) absolute timestamp.

Then we'd need to change the player and the recorder as well - I could submit a few ideas.

@peternewman do you all still prefer plain text show files? I perf'd ola_recorder and it spends most of its time (37%) in memchr and strtoll, which likely relates to converting the text representation of the DMX buffer to binary. Didn't see much time spent in I/O routines, at least for my 30Hz 512 channel test input.

But I guess an initial "OLA Show v2" with higher precision timestamps could be a start?

Edit: turns out we can fix the recorder without changing format.

shenghaoyang commented 4 years ago

@magdesign Updated #1680 , should be better now at the recording side.

MagnusHerrlin commented 4 years ago

Hi

This looks very promising.

I have a suggestion:

Why not have a system variable called ola_recorder_speed or something ? that can be changed from the commandline , TCP socket or any program written in whatever language. that can vary from say -1 to 1 or something that function as a master speed. Then ola_recorder or rather Showplayer takes this variable into its calculations when playing back the file. Because then the playback speed could be changed 'on the fly' and the timing could be trimmed to suit whatever hardware is attached.

Thank you all for all the hard work you put into this

/Magnus

shenghaoyang commented 4 years ago

Well the trimming is performed automatically now, so if the playback is too slow it reduces the wait time to catch up and if it's too fast it waits longer - there shouldn't be a need for a speed control.

The current issue is at the recorder's side, due to accumulating roundoff errors when writing the wait times to disk. The latest commit in #1680 should help.

If it's still slow, a speed control won't help much - it will sleep for the shortest possible time(0ms - I don't think we can skip the sleep entirely because that will block the event loop from processing other events, especially if there's a lot of 0ms-long frames). You'd probably have to fiddle with the O/S scheduler at that point.

I guess we could also do stuff like frame skips, but that might involve side effects people don't want .

MagnusHerrlin commented 4 years ago

@shenghaoyang

What I'm looking for is the possibility to change Showplayers playbackspeed while playing, the program I wrote called filter removes/adds milliseconds to a recorded showfile to make it play slower/faster. But sure do jumps, pause or even singlestep a file back/forth a file would be nice. Being able to play/pause or jump back/forth would allow for several instances or ola_recorder running changing channels to the same universe but only one at a time The best would be if the format would allow to only record certain channels in a universe as I talked about earlier and leave the rest alone.

So what are we recording anyway? A hell of a lot of frames at a crazy speed filling up storage wouldn't it make sense just to record the changes of the universe? What if the look of your universes stay static for 5 minutes? How much space does that take?

This would make it possible to trigger these channels only perhaps have a command line argument to make ola_recorder to start in a paused state These instances of ola_recorder could then be started/stopped by gpio, commandline or whatever

Thanks /Magnus

shenghaoyang commented 4 years ago

I think the current recorder subscribes to universe change events, so shouldn't record unless a channel changes (but it would still record the whole universe even if one channel changes).

Maybe open a new issue for those new formats? This one is dealing with the drift, and it's getting longer and longer :skull: .

If you need a workaround for large files when recording you can do some bash-fu with process substitution. The data has lots of repeated values so compressors can have a field day.

ola_recorder -u 1 --record >(xz - > show.xz)

-rw-r--r-- 1 root root 2.0M Nov  5 01:03 show_unxz
-rw-r--r-- 1 root root 4.7K Nov  5 01:02 show.xz

~.2% of the original file size for about 512 channels with 1 channel changing per frame, 1k frames total.

(please don't kill me for root:root, this is a rootless container :wink:)

magdesign commented 4 years ago

still trying to debuild your fork/mitigate-drift to get a *.deb for testing....

opening a new thread for speed stuff and one for fileformats makes sense

magdesign commented 4 years ago

@shenghaoyang

I was not able to create a *.deb package from your https://github.com/shenghaoyang/ola fork. So I installed the previous posted deb files and then compiled your fork and made a make install

Now I am trying to record dmx with:

ola_recorder --record-version 2 -u 1,2,3,4,5,6,7,8,9,10 -r /media/internal/dmx/show01

but I get: ola_recorder: unrecognized option '--record-version'

Would be very nice to get some advice on where to pull the correct repo and how to record with version 2

shenghaoyang commented 4 years ago

Ah, sorry - I folded the record version option away in the latest commit because there's no need for a new file version, and the fix can be incorporated into the current record format.

I think you can simply try recording just like before.

Hm - I didn't do anything special with the build: just make sure to grab the latest updates from that branch. What's the error that you're encountering ?

magdesign commented 4 years ago

Aha, thanks, so I tried to record with my regular record command: sudo ola_recorder -u 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20 -r /media/internal/dmx/show01

but then I immediately get this error:

ola_recorder: symbol lookup error: ola_recorder: undefined symbol: _ZNK3ola5Clock20CurrentMonotonicTimeEPNS_9TimeStampE

(there is an RTC clock present on my Rpi)

and about the debuild:

Since I need to kill all network stuff (since this trouble https://github.com/OpenLightingProject/ola/issues/1679) I am not able to copy paste the error output. Only a photograph, which feels embarrassing:

build_error

I need more time to send the debuild errors for the stable branch, then we might find some things.

shenghaoyang commented 4 years ago

Could you try running ola_recorder from the examples folder in your build directory directly? Like ./ola_recorder .

If that doesn't work maybe you need a clean rebuild? I.e. make distclean && configure ..... && make -j 4.

The new version uses the monotonic time clock introduced in the latest master, and the missing reference points to the loader not being able to find that functionality in the OLA libraries installed on the system (strange).

The error in debuild looks like your current compiler doesn't support a feature. What's the version of GCC installed locally?

magdesign commented 4 years ago

Tried running from the examples in the build folder.:

My local GCC version is gcc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516

I make everything clean now and build again from scratch

EDIT: The error only appears when trying to record with artnet plugin. So please create an artnet input in ola and then try to run the recorder to reproduce the error.

artnet

shenghaoyang commented 4 years ago

I think you should get a fresh clone (and make sure all changes are pulled!) and maybe try again? The ola_recorder executable created in examples/ is actually a script that should make the recorder use the correct libraries, even if they are not installed.

It should have nothing to do with Art-Net. Maybe without the Art-Net plugin you weren't receiving any frames? The fault should trigger only after a frame has been received (CurrentMonotonicTime() won't be called otherwise), likely due to lazy binding.

Alternatively, you can run the recorder in the example directory with LD_LIBRARY_PATH="<build_ola_root>/common/.libs:<build_ola_root>/ola/ola/.libs:$LD_LIBRARY_PATH" .libs/ola_recorder (but this should really not be needed).

magdesign commented 4 years ago

Finally, I cleaned all ola stuff from my system as described here: https://gist.github.com/magdesign/a7face5be864c947a3573ecc6bde858a from line 29 on.

Then cloned your repo, switched to the branch, make => all good. (make check gives following error: https://gist.github.com/magdesign/b675bd0cab6161db1cfeeae4985de5f6)

Then I started the/ola/./olad Created an artnet plugin and did the recording from /ola/examples/

Your fix is great !! Made a 2 minute test and it stays in time, wow, amazing.

I will now try to debuild and create .*deb packages for arm, so I can test on other units and send to people who have lots of lights for testing :-)

shenghaoyang commented 4 years ago

Did you find a way to stop debuild running make check? There's blog posts floating around the web that talk about those.

The tests work for me, but that's probably because I have gcc >= 8 for that string warning.

magdesign commented 4 years ago

I finally could build it without checking (thanks for the hint):

DEB_BUILD_OPTIONS=nocheck debuild

But for future I need to help @peternewman checking whats going wrong https://github.com/OpenLightingProject/ola/issues/1679

The debs are located here: https://github.com/magdesign/ola_with_fix/

Will now do some tests and ask other people to test as well, but looks promising so far.

Huge thanks to everyone involved in this, its a pleasure to have all of you!

@shenghaoyang : may I donate you a meal via paypal

shenghaoyang commented 4 years ago

Nah, it's okay - glad it works for you.

peternewman commented 4 years ago

Sorry I've been a bit AWOL, a few reponses:

Isn't it more interesting to measure what you acually get out to your lights? after whatever plugin you are using (including its delay) , but this really does not seem to matter too much as I think (correct me if I'm wrong) @magdesign is using a different plugin? and the drift seems to happen regardless whatever plugin you use.

And really I think the output from ola_recorder should not vary with cpu load even if I played 3 soundfiles and 8 videofiles at the same time.

@MagnusHerrlin not necessarily, we've got lots of different plugins with different code behind them, e.g. if we use one of the microcontroller based interfaces, that may add some delay or similar. We certainly don't want to tune the system based on the behaviour of a single plugin. Plus as others have mentioned we've got a client programme (ola_recorder) and the OLA core (olad), ideally we want to find which bit is causing the issue and fix it there rather than working around it elsewhere! The fact it's happening with multiple plugins suggests it is indeed the recorder end.

well ola_recorder does not have to know the current value of the output it should just change valid values to olads buffer and if the value if not valid meaning two ,, just step to next channel.

Yeah mostly as @shenghaoyang said:

the current RPC interface (I think) does not allow access to the buffers. You can only send a buffer to OLAD, and hope it does the right thing with LTP / HTP merging. I don't think there is any special value to leave a channel unchanged.

In theory you could write a client/plugin which would probably need to be a pipeline, so it could read from universe A, merge in your new DMX format with the magic gaps and output the result as universe B, but that doesn't help if you also want to merge the results with some other clients (or at least it locks this magic functionality into ola_recorder only). Hence my suggestion to write a separate client to do it (which could then be merged into the code as a filter when we redo patcher). In the same way as Linux comprises of lots of little command line tools like grep/sed/tail etc.

We could create a new form of DMX buffer with a mask/alpha channel sort of behaviour and then rework the whole system to use it, but that seems a bit unnecessary IMHO for a fairly niche usage and a lot of unnecessary overhead most of the time.

You can start multiple ola_recorders pointing at the same or different universes already.

Really but what happens to the output then? if one instance of ola_recorder is sending out 0 to a channel and a second instance sends out 255 to the same channel in the same universe I guess that channel will start to flicker

It depends on your merge behaviour of the universe you're sending to. In the same way as you could patch an Art-Net and an E1.31 universe to the same OLA universe. If you do an HTP merge, the 255 will win all the time. If you do LTP then yes it would probably flicker (depending on timing etc).

Yeah as @shenghaoyang said, because ola_recorder is a client it uses the normal RPC stuff.

From what I see ola_recorder has its own buffers, and it sends them to the OLA daemon through a TCP RPC channel. There is no zero-copy mechanism.

If it is the systemcall to read a line from a file that takes time then why not use two buffers instead?

My suspicion is, for the non-network protocols (and possibly for them too) a bit of file I/O is likely to have a lower overhead than all the USB bus stuff, especially for some of the plugins where you have to send multiple USB packets to output one DMX frame.

But if you want deterministic jitter OLA is probably not what you want to use.

Also in fairness to ola_recorder, as far as I'm aware, it was effectively written as a bit of a demo almost. Because the olad core does all the heavy lifting, all you need to do to make a recorder is register for a universe and write those frames and timestamps to a file and then reverse the process for playback, it's not really doing anything particularly complicated or clever.

Integrating some kind of scheduled send in the server could be a solution, but that's new RPCs and changes to the core).

That particular thing may not be too hard (aside from all the timezone/timestamp stuff), as we've got lots of equivalent wait task stuff (assuming they currently fire accurately) and I guess that could potentially help clients that aren't very speedy for whatever reason. I'd probably be open to that if you wanted to have a go @shenghaoyang .

From my POV by having OLA run as user processes you're getting the flexibility to multiplex lots of tasks on a single device (e.g. a Pi), but the tradeoff is that you have to live with some form of jitter - which you can reduce but never eliminate.

Yeah this has always been my thought. You have your bare metal device, but then you also need some sort of video/audio player and a means to link them and control them in sync. Whereas with Linux that all comes free or at least very easily and you've also got almost infinite flexibility in the system (want to randomly pick a show each day, or run special seasonal ones for Christmas and Halloween), that's five minutes of coding and you're done. Or link it up to say a Google Calender so someone less technical can programme which days to do a wedding show or whatever. You're unlikely to get most of that flexibility in a bare metal device without rewriting a lot of wheels.

peternewman commented 4 years ago

I think the Debian demon file should be updated. In newer Debian systems, services are stored in/etc/systemd/system/ so creating a /etc/systemd/system/olad.service file with this content would do the trick.

@shenghaoyang started on this in https://github.com/OpenLightingProject/ola/pull/1444 and I'm shamefully behind at reviewing that PR and all the other good stuff it was bringing in!

but after approx. 1min 30s it started to be faster than it should: It looks like @shenghaoyang has got to the bottom of this, but just to make the point it could have been the video dropping/repeating frames. For stuff like this it's always better to prove it with something hopefully known reliable (e.g. a mobile phone/wall clock).

@peternewman do you all still prefer plain text show files? I perf'd ola_recorder and it spends most of its time (37%) in memchr and strtoll, which likely relates to converting the text representation of the DMX buffer to binary. Didn't see much time spent in I/O routines, at least for my 30Hz 512 channel test input.

Wow, that's impressive. My gut feeling from above is still that the benefits of plain text is worth it (e.g. as you've just done hacking that file around. Do you think it actually makes much impact on the system? I guess we could have a binary format too and then some way to convert back and forth, but ideally pre-converting the next buffer would seem the preferred option to me.

Edit: turns out we can fix the recorder without changing format.

Even better!

peternewman commented 4 years ago

Why not have a system variable called ola_recorder_speed or something ? that can be changed from the commandline , TCP socket or any program written in whatever language. that can vary from say -1 to 1 or something that function as a master speed. Then ola_recorder or rather Showplayer takes this variable into its calculations when playing back the file. Because then the playback speed could be changed 'on the fly' and the timing could be trimmed to suit whatever hardware is attached.

Can I suggest we put such feature requests in a separate PR, unless you're just suggesting this as a fudge factor to workaround this issue.

The biggest issue with lots of these extra options is how you present them to the user in a useful way. E.g. a CLI option is fine, but then some people might find that a bit clunky. If you want to do it in real-time during playback then really here be dragons in terms of corner cases (e.g. consider time effectively going backwards).

But sure do jumps, pause or even singlestep a file back/forth a file would be nice. Being able to play/pause or jump back/forth would allow for several instances or ola_recorder running changing channels to the same universe but only one at a time

Yeah this should definitely be in a separate issue, people will definitely want remote control of that, e.g. a HTTP API.

The best would be if the format would allow to only record certain channels in a universe as I talked about earlier and leave the rest alone.

This should still be a filter/standalone client not part of ola_recorder.

So what are we recording anyway? A hell of a lot of frames at a crazy speed filling up storage wouldn't it make sense just to record the changes of the universe? What if the look of your universes stay static for 5 minutes? How much space does that take?

It currently normally records every received frame (as @shenghaoyang sort of guessed), so it depends on the frame rate of your recording source and what it does. We could certainly use that sort of de-duplication, although again it's more code to potentially go wrong for generally limited benefit. The other half of this is the output side, different plugins and systems have different requirements for output. We generally try and avoid re-generating or otherwise processing the output DMX if possible (as we can't tell what's coming next without introducing a delay to the system), but you've sort of got no choice unless you do that in this mode. I guess you could extend the format so you mark the universe number but don't store the frame contents and then load it from the previous frame, e.g. (where the last frame would repeat the 20,30,40 one):

1 10,20,30
40
1 20,30,40
40
1 -
40

This would make it possible to trigger these channels only

Still a filter!

perhaps have a command line argument to make ola_recorder to start in a paused state These instances of ola_recorder could then be started/stopped by gpio, commandline or whatever

You can do this already using ola_trigger and another universe to start the recordings, with the source of that data being DMX, an OLA CLI client, GPIO etc. Again another issue, other people have asked for this too (e.g. triggering recording based on a channel changing value).

shenghaoyang commented 4 years ago

Yeah, we could preconvert the next buffer and take advantage of the stuff, but I wonder if we're taking the recorder a bit too far. Profiling didn't see I/O being the bottleneck anyway.

I've been planning on a massive hack with converting the show files to video files and then letting an external player (vlc, gst) do the hard work while we collect their nicely decoded output :stuck_out_tongue_winking_eye: (plus you should get ff rewind loop, remote-control, ... for free) - if it works.

peternewman commented 4 years ago

My local GCC version is gcc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516

Can we have yet another issue for that stringop-truncation issue please @magdesign : https://github.com/OpenLightingProject/ola/blob/master/plugins/dummy/DummyPortTest.cpp#L665-L683

peternewman commented 4 years ago

Yeah, we could preconvert the next buffer and take advantage of the stuff, but I wonder if we're taking the recorder a bit too far. Profiling didn't see I/O being the bottleneck anyway.

Heh, yeah I assumed that would probably be the case.

I've been planning on a massive hack with converting the show files to video files and then letting an external player (vlc, gst) do the hard work while we collect their nicely decoded output (plus you should get ff rewind loop, remote-control, ... for free) - if it works.

That seems quite neat, you could use our existing streaming CLI tool and do it all with hacky bash scripts! I'd be curious how much delay it adds as I assume most of them buffer a bit! I'd look at animation codec (QT RLE) as it would have some compression built in.

shenghaoyang commented 4 years ago

and do it all with hacky bash scripts!

That's the plan :tophat: !