WinkelCode commented 1 year ago

I have two problems:

When only streaming audio, it's a bit too fast when compared to the Apple Music lyrics, my iPad assumes a delay that isn't there.
The video is a bit choppy when connected via WLAN, I tried with a LAN adapter on my iPad and it seemed smoother. I suspect that a larger buffer may also help here.

I can't test video with it, but Shairport4w has an option to set the buffer size, a size of 201 frames seems perfect, at least for the music delay.

Is there such an option for UxPlay? I couldn't find anything.

fduncanh commented 1 year ago

There currently arent any options like these. If they involve apple "plist" items that the server send to the iPad (see the debug output in the uxplay wiki for example) they are set in lib/handlers.h and are easily made user-accessible. There seems to a setting "outputlatencymicros" that gets set 100 microseconds (and strangely, in a the second (alternative)format to 101 microseconds.

they get set at lines 110 and 120 in lib/raop_handlers.h

If making these user-settable would help this can be done. maybe you could experiment.

I would need to reproduce your issue to know what you are concerned about, and your description is not clear enough. maybe looking at the Shairport4w code would help. Please specify where it can be found etc.

below is a debug mode output showing uxplay sending plist information to the client (e,g iPad)

Handling request GET with URL /info

RTSP/1.0 200 OK 
CSeq: 0 
Server: AirTunes/220.68 
Content-Type: application/x-apple-binary-plist 
Content-Length: 1063 

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>txtAirPlay</key>
    <data>
    GmRldmljZWlkPTdlOmU4OmMyOmY4OjAzOjMwF2ZlYXR1cmVzPTB4NUE3RkZFRTYsMHgw
    CWZsYWdzPTB4NBBtb2RlbD1BcHBsZVRWMywyQ3BrPWIwNzcyN2Q2ZjZjZDZlMDhiNThl
    ZGU1MjVlYzNjZGVhYTI1MmFkOWY2ODNmZWIyMTJlZjhhMjA1MjQ2NTU0ZTcncGk9MmUz
    ODgwMDYtMTNiYS00MDQxLTlhNjctMjVkZDRhNDNkNTM2DnNyY3ZlcnM9MjIwLjY4BHZ2
    PTI=
    </data>
    <key>features</key>
    <integer>1518337766</integer>
    <key>name</key>
    <string>UxPlay@elite</string>
    <key>audioFormats</key>
    <array>
        <dict>
            <key>type</key>
            <integer>100</integer>
            <key>audioInputFormats</key>
            <integer>67108860</integer>
            <key>audioOutputFormats</key>
            <integer>67108860</integer>
        </dict>
        <dict>
            <key>type</key>
            <integer>101</integer>
            <key>audioInputFormats</key>
            <integer>67108860</integer>
            <key>audioOutputFormats</key>
            <integer>67108860</integer>
        </dict>
    </array>
    <key>pi</key>
    <string>2e388006-13ba-4041-9a67-25dd4a43d536</string>
    <key>vv</key>
    <integer>2</integer>
    <key>statusFlags</key>
    <integer>68</integer>
    <key>keepAliveLowPower</key>
    <integer>1</integer>
    <key>sourceVersion</key>
    <string>220.68</string>
    <key>pk</key>
    <data>
    sHcn1vbNbgi1jt5SXsPN6qJSrZ9oP+shLviiBSRlVOc=
    </data>
    <key>keepAliveSendStatsAsBody</key>
    <integer>1</integer>
    <key>deviceID</key>
    <string>7e:e8:c2:f8:03:30</string>
    <key>audioLatencies</key>
    <array>
        <dict>
            <key>outputLatencyMicros</key>
            <false/>
            <key>type</key>
            <integer>100</integer>
            <key>audioType</key>
            <string>default</string>
            <key>inputLatencyMicros</key>
            <false/>
        </dict>
        <dict>
            <key>outputLatencyMicros</key>
            <false/>
            <key>type</key>
            <integer>101</integer>
            <key>audioType</key>
            <string>default</string>
            <key>inputLatencyMicros</key>
            <false/>
        </dict>
    </array>
    <key>model</key>
    <string>AppleTV3,2</string>
    <key>macAddress</key>
    <string>7e:e8:c2:f8:03:30</string>
    <key>displays</key>
    <array>
        <dict>
            <key>uuid</key>
            <string>e0ff8a27-6738-3d56-8a16-cc53aacee925</string>
            <key>widthPhysical</key>
            <false/>
            <key>heightPhysical</key>
            <false/>
            <key>width</key>
            <integer>1920</integer>
            <key>height</key>
            <integer>1080</integer>
            <key>widthPixels</key>
            <integer>1920</integer>
            <key>heightPixels</key>
            <integer>1080</integer>
            <key>rotation</key>
            <false/>
            <key>refreshRate</key>
            <integer>60</integer>
            <key>maxFPS</key>
            <integer>30</integer>
            <key>overscanned</key>
            <false/>
            <key>features</key>
            <integer>14</integer>
        </dict>
    </array>
</dict>
</plist>

fduncanh commented 1 year ago

Here is the linkto shairport4w code I found

https://github.com/Frank-Friemel/Shairport4w/tree/master/src

maybe you could find the place in the code where the settings you mentioned are set and/or used in it?

fduncanh commented 1 year ago

I couldn't see how to build shairport4w

is only binary code offered by that site, with no build instructions for the source?

shairport4w presumably means " for windows"

maybe the original shairport code is more useful?

WinkelCode commented 1 year ago

Thanks for looking into this. Unfortunately I don't have time to look into this super extensively today, but I'll probably get to it in the next few days.

I did a small test where I took this part of `raop_handlers.h`

    plist_t audio_latencies_0_output_latency_micros_node = plist_new_bool(0);
    plist_t audio_latencies_0_type_node = plist_new_uint(100);
    plist_t audio_latencies_0_audio_type_node = plist_new_string("default");
    plist_t audio_latencies_0_input_latency_micros_node = plist_new_bool(0);
    plist_dict_set_item(audio_latencies_0_node, "outputLatencyMicros", audio_latencies_0_output_latency_micros_node);
    plist_dict_set_item(audio_latencies_0_node, "type", audio_latencies_0_type_node);
    plist_dict_set_item(audio_latencies_0_node, "audioType", audio_latencies_0_audio_type_node);
    plist_dict_set_item(audio_latencies_0_node, "inputLatencyMicros", audio_latencies_0_input_latency_micros_node);
    plist_array_append_item(audio_latencies_node, audio_latencies_0_node);
    plist_t audio_latencies_1_node = plist_new_dict();
    plist_t audio_latencies_1_output_latency_micros_node = plist_new_bool(0);
    plist_t audio_latencies_1_type_node = plist_new_uint(101);
    plist_t audio_latencies_1_audio_type_node = plist_new_string("default");
    plist_t audio_latencies_1_input_latency_micros_node = plist_new_bool(0);

and edited all the integers to be 0, 100, 500, 900 and 1000. (I made sure to set latencies_1 to n+1, i.e. 901)

After each edit I ran: rm -r *; cmake ..; ninja; ./uxplay.exe from the "build" directory in the source files, then connected with my iPad, first audio only, then video, and noted what happened.

Result: No noticeable change that I could see from these edits.

About Shairport4w

Yes, https://github.com/Frank-Friemel/Shairport4w is correct (sorry, forgot to include the link in my original post).

It's Windows only, I assume it is built using the .sln file.

For further comparisons I'll fire up Linux and use "normal" Shairport.

More detailed problem description

I am testing right now using the Apple Music app. I can either just stream the music using the bottom left icon: Or open up the full AirPlay screen sharing:

When I am just streaming the music, the lyrics scroll a bit too late, this is especially noticeable with the new system on some songs where it progresses through the individual words.

When I am streaming the screen, the lyrics are properly synchronized again.

This is issue 1, I am guessing the underlying problem is that the iPad is mistaken about the audio latency.

Issue 2 is, I am under the impression that the screen streaming is a bit choppy due to what "feels" to me like not enough buffering. I didn't test it, but from my gaming experience I'd call it inconsistent frame timing.

Addendum: Shairport4w also has latency issues with the default options, but by experimenting I found a buffer of 201 frames to be perfect. Interestingly it has terrible latency in other ways: play/pause, skipping and changing songs is (relatively) slow, it also has controls on the receiver/PC side, which take multiple seconds to arrive at the iPad.

If you know of any other parts I could edit, or want me to set it to a specific value. I'd be happy to try it right now as I have everything set up to build UxPlay. So far, the edits as I described them above didn't yield any noticeable change from the original version.

Also, I am running Windows 11 with MSYS2, but I think that shouldn't matter too much? I looked a bit into the video/audio sinks and didn't find any options for latency.

Addendum 2: Actually, I found that "wasapisink" has integer guint buffer_frame_count;, but it doesn't seem like it can be set via the audio sink options? https://gstreamer.freedesktop.org/data/doc/gstreamer/head/gst-plugins-bad/html/gst-plugins-bad-plugins-wasapisink.html

Addendum 3: Sorry for the thousand edits, I think this should be the final version of the comment. 📌

fduncanh commented 1 year ago

if there as an option x=y for wasapisink

uxplay  -vs "wasapisink x=y"

should set it

as can be seen from a debug trace for airplay mirror mode

raop_rtp audio: now = 1670885910.406795, npt = 1670885910.938638, latency = -0.531843, rtp_time=1875781135 seqnum = 45238
raop_rtp audio: now = 1670885910.406819, npt = 1670885910.949157, latency = -0.542338, rtp_time=1875781615 seqnum = 45239
raop_rtp audio: now = 1670885910.406833, npt = 1670885910.959590, latency = -0.552757, rtp_time=1875782095 seqnum = 45240
raop_rtp video: now = 1670885910.411231, ntp = 1670885910.514068, latency = -0.102837
raop_rtp video: now = 1670885910.443677, ntp = 1670885910.547390, latency = -0.103713
raop_rtp video: now = 1670885910.487339, ntp = 1670885910.580705, latency = -0.093366
raop_rtp audio: now = 1670885910.498100, npt = 1670885910.970707, latency = -0.472607, rtp_time=1875782575 seqnum = 45241
raop_rtp audio: now = 1670885910.498124, npt = 1670885910.981729, latency = -0.483605, rtp_time=1875783055 seqnum = 45242
raop_rtp audio: now = 1670885910.498136, npt = 1670885910.992658, latency = -0.494522, rtp_time=1875783535 seqnum = 45243

The video is operating at a latency of -0.1 sec and the audio at -0.5 sec This means it arrives 0.1sec and 0,5 sec before the rendering time.

These settings are inherited fro rpiplar via airserver via shairplay. The audio-only mode of shairplay was removed and replaced by mirror mode in airserver, and was absent in rpiplay. It was put back at some point in uxplay development.

It was assumed that the audio latency didn't matter in audio-only alac mode, because there wasn't any video to sync with

It would be fairly easy to use a different latency setting in audio-ony mode. right now audio is played about about 0.5sec after it is decrypted (some extra time after that is needed for gstreamer to decode from alac to pcm.) This just needs to be finished by the time the playing time is reached, otherwise frames get dropped.

look at the bottom of raop_handlers.h

static void
raop_handler_record(raop_conn_t *conn,
                    http_request_t *request, http_response_t *response,
                    char **response_data, int *response_datalen)
{
    logger_log(conn->raop->logger, LOGGER_DEBUG, "raop_handler_record");
    http_response_add_header(response, "Audio-Latency", "11025");
    http_response_add_header(response, "Audio-Jack-Status", "connected; type=analog");
}

That 11025 is a quarter of 44100 Hz, the audio sampling frequency. I think this is to play audio 1/4 of an audio frame after the correspond video frame timing (both audio and video have time stamps)

In audio-only mode, there are no video timestamps, but maybe this setting should be different from 11025 in audio-only mode.

fduncanh commented 1 year ago

another way to mess with the audio time could be to add an offset to the audio timestamps. these pass from raop to gstreamer in a callback function in uxplay.cpp line 960

extern "C" void audio_process (void *cls, raop_ntp_t *ntp, audio_decode_struct *data) {
    if (dump_audio) {
        dump_audio_to_file(data->data, data->data_len, (data->data)[0] & 0xf0);
    }
    if (use_audio) {
        audio_renderer_render_buffer(ntp, data->data, data->data_len, data->ntp_time, data->rtp_time, data->seqnum);
    }
}

only data->ntp_time is used for sync. I think this is the unix time in microseconds at which the packet should be played. so I think subtracting 1000000 (one million microsecs = 1 sec) should make the audio play 1 sec earlier.

try it and see. It would be easy to either add a user setting to specify an offset, or add an appropriate one in audio-only mode.

WinkelCode commented 1 year ago

I tried this ./uxplay.exe -vs "wasapisink buffer_frame_count=200" and got get_parse_launch error (video) : no property "buffer_frame_count" in element "wasapisink0", I tried a bunch of different stuff earlier with no success, I think it's just not exposed as an option.

Editing http_response_add_header(response, "Audio-Latency", "11025"); resulted in no perceivable change. Audio is still faster than the lyrics. It also didn't become "unsynced" in screen sharing mode.

I tried this:

    if (use_audio) {
        // audio_renderer_render_buffer(ntp, data->data, data->data_len, data->ntp_time, data->rtp_time, data->seqnum);
        audio_renderer_render_buffer(ntp, data->data, data->data_len, data->ntp_time - 5000000, data->rtp_time, data->seqnum);
    }

And... no change. I also tried the same with rtp_time and just basically adding a ton of zeroes.

Edit: Just as a sanity check I wrote some garbage to see if it would fail to build and it did, so I am editing the right files.

fduncanh commented 1 year ago

try

if (use_audio) {
    uint64_t offset = 1000000;   /*  1 sec offset */
    printf("ntp_time before %ul", data->ntp_time);
    data->ntp_time -= offset;
    printf(" after %ul\n", data->ntp_time);
    audio_renderer_render_buffer(ntp, data->data, data->data_len, data->ntp_time, data->rtp_time, data->seqnum); 
   }

try adding (+=) istead of subtracting (-=) the offset also

no access to test till next thursday (b/c travel)

in renderers/audio_renderer/gstreamer.c line 186 unit64_t ntp_time is being fed into gstreamer for audio timing. (no use is made of rtp_time)

void audio_renderer_render_buffer(raop_ntp_t *ntp, unsigned char* data, int data_len, uint64_t ntp_time,
                                  uint64_t rtp_time, unsigned short seqnum) {
    GstBuffer *buffer;
    bool valid;
    if (data_len == 0 || renderer == NULL) return

    /* all audio received seems to be either ct = 8 (AAC_ELD 44100/2 spf 460 ) AirPlay Mirror protocol *
     * or ct = 2 (ALAC 44100/16/2 spf 352) AirPlay protocol.                                           *
     * first byte data[0] of ALAC frame is 0x20,                                                       *
     * first byte of AAC_ELD is 0x8c, 0x8d or 0x8e: 0x100011(00,01,10) in modern devices               *
     *                   but is 0x80, 0x81 or 0x82: 0x100000(00,01,10) in ios9, ios10 devices          *
     * first byte of AAC_LC should be 0xff (ADTS) (but has never been  seen).                          */

    buffer = gst_buffer_new_and_alloc(data_len);
    g_assert(buffer != NULL);
    GST_BUFFER_PTS(buffer) = (GstClockTime) ntp_time;
    gst_buffer_fill(buffer, 0, data, data_len);

<snip>

WinkelCode commented 1 year ago

I made a syntax mistake earlier: it's -=/+=. With that I can see the change reflected in the log: raop_rtp audio: now = 1673059507.250779, ntp = 1673058509.023506, latency = 998.227273, rtp_time=2067409842 seqnum = 64617

But neither adding nor subtracting results in changes to the audio.

I just tried your edit, and it's the same result unfortunately :(

It does work ntp_time before 2483099115l after 2478099115l, but the actual audio stays the same. (Note: I've been using >5 second delays, so I can immediately spot changes, but I also tried 1 sec and it didn't work)

I'll have to try it on Linux, I wonder on Windows the timestamp is just ignored (on the Gstreamer side)?

WinkelCode commented 1 year ago

ChatGPT actually almost found a solution(???):

In renderers\audio_renderer_gstreamer.c / void audio_renderer_init

Add min-threshold-time:

for (int i = 0; i < NFORMATS ; i++) {
        renderer_type[i] = (audio_renderer_t *)  calloc(1,sizeof(audio_renderer_t));
        g_assert(renderer_type[i]);
        GString *launch = g_string_new("appsrc name=audio_source ! ");
        g_string_append(launch, "queue min-threshold-time=5000000000 ! ");  // Introduce 5-second delay using queue element
        g_string_append(launch, "queue ! ");

ChatGPT made the mistake of using max-size-time which isn't exactly what I need, but close enough that I got the rest. (I have never done Gstreamer stuff)

Edit: This messes up the screen sharing mode because now it's de-synced, but its finally something that actually does anything. I'm done troubleshooting for today 😓 but I'll be back tomorrow.

Edit 2: min-threshold-time=5000000000 is actually pretty spot on to be in-sync with the Apple Music lyrics, at least with my setup. However, ideally we'd tell the iPad the correct delay, because the overall latency is pretty bad now.

Also, video is looking nice and smooth now when doing the same thing for video_renderer_init.

Edit 3: Actually, I wonder if Apple Music might have hardcoded that delay in the lyrics when streaming. I guess someone would have to check against a real Apple TV (I don't have one).

fduncanh commented 1 year ago

when I can test next week I can test on an appleTV.

Its easy to add code to apply some fix only when audio-only mode is active. a new audio pipeline is created when the mode switches. since there is no video to sync with there should be no problem

If I understand correctly, you want streamed audio to be rendered in sync with video showing on the client? I had noticed that the two were delayed, but did not think it was an issue. If should be fixable, if there is enough time to process the audio on the server before the video is shown on the client.

But maybe you are watching the lyrics on a metadata image rendered on the server? This is Apple Music? I have only tested with Apple Radio. I'll need to sign up for an Apple Music free trial subscription to see your issue, I guess. I meant to do this to see what kind of metadata it sends.

fduncanh commented 1 year ago

ChatGPT actually almost found a solution(???):

is there a link to this? where is ChatGPT's solution found"


for (int i = 0; i < NFORMATS ; i++) {
        renderer_type[i] = (audio_renderer_t *)  calloc(1,sizeof(audio_renderer_t));
        g_assert(renderer_type[i]);
        GString *launch = g_string_new("appsrc name=audio_source ! ");
        g_string_append(launch, "queue min-threshold-time=5000000000 ! ");  // Introduce 5-second delay using queue element
        g_string_append(launch, "queue ! ");

looks like gstreamer uses nanosecs.

WinkelCode commented 1 year ago

Its easy to add code to apply some fix only when audio-only mode is active. since there is no video to sync with there soul be no problem

I don't think that using the method of adding artificial latency is a good idea to automatically apply, the added latency is annoying when you only care about responsiveness.

If I understand correctly, you want streamed audio to be rendered in sync with video showing on the client?

It's a bit complicated, in most cases, if we need to sync between client and server (iPad and PC), we only care about minimum latency, which is the default, from my understanding.

The problem is Apple Music's lyrics (on the iPad's screen) become automatically delayed when Airplaying, I can even see them sometimes "going backwards" when it connects, This can be annoying when I am using my PC (the connected headphones/speakers) for audio output via UxPlay.

For whatever reason, my iPad thinks it needs those 5 secs of delay to keep up with the Airplay device in audio-only mode. This isn't a problem when also streaming the screen, in that case Apple Music doesn't introduce that delay.

Example: If I run UxPlay with -vs 0 and share my screen, everything is in sync, if I just share the audio, the lyrics are delayed on my iPad. What I see "on my PC's monitor" is the same.

is there a link to this? where is ChatGPT's solution found

I was basically just interrogating ChatGPT if it had any ideas on how to introduce a delay with Gstreamer. I started out by asking it about gst_pipeline_set_latency, and it ended up telling me about max-size-time, which then led me to look up the documentation where I ended up finding min-threshold-time.

The new line is literally just g_string_append(launch, "queue min-threshold-time=5000000000 ! "); (replace 5000000000 with the desired delay, if any.)

WinkelCode commented 1 year ago

I have only tested with Apple Radio. I'll need to sign up for an Apple Music free trial subscription to see your issue, I guess. I meant to do this to see what kind of metadata it sends.

Yes, I think Apple Music Radio (the "actual" streamed stations like Music 1/Hits/Country, not the "genre stations") don't have these synced up lyrics.

It would be interesting to know where Apple Music gets those 5 seconds delay from, I wonder if it might be a fallback?

Edit:

The other thing why I opened this issue was increasing the buffer size in general. Though after some more testing, it doesn't seem like video playback benefits too much from it. It stutters a bit to begin with but gets better after a couple seconds. I'll have to do more testing on this with proper comparison recordings. However, I think having the ability adjust the playback delay of audio and video is generally a good feature to have, even if it's just to work around a weird Apple Music thing.

fduncanh commented 1 year ago

interesting idea to ask ChatGPT. I wasnt aware of this AI bot as a technical resource.....

WinkelCode commented 1 year ago

interesting idea to ask ChatGPT. I wasnt aware of this AI bot as a technical resource.....

I find it useful when I'm in "unknown territory" and need to get a rough idea where to start. Good answers can be hit and miss, but usually I at least come away with a general idea where to look in the documentation or forums.

In this case, I would need to take a deeper look at the documentation, for example if g_string_append(launch, "queue ! "); is made redundant by the aforementioned new line, but for now I was just trying to make it work as someone with no experience with Gstreamer.

fduncanh commented 1 year ago

I didn't make the audio pipeline user-modifiable like the video one. if it was, things like your modification would be easy to test without recompiling.

https://gstreamer.freedesktop.org/documentation/coreelements/queue.html?gi-language=c

WinkelCode commented 1 year ago

I didn't make the audio pipeline user-modifiable like the video one. if it was, things like your modification would be easy to test without recompiling.

On my PC it's rebuilt and started in about 2 seconds, so it's not really a problem. The basic implementation would just be arguments for the video and audio delay that get passed on via queue min-threshold-time= for their respective pipelines, maybe converted from milliseconds for convenience.

That would leave the issue that started this, which, for all I know, could be a bug in Apple Music itself, ultimately this depends on finding "proof" that the artifical delay can be adjusted somehow on the iPad's side by a differently behaving AirPlay server.

Edit: To avoid any confusion, the reason why Shairplay4w has synced lyrics is because it adds so much delay, and I configured it to match the artificial delay on the lyrics.

Additionally I have:

An Android box with some Airplay Server app
A 2015 MacBook Pro running Ventura, OpenCore Legacy Patcher should unlock all features (including the Airplay Receiver) but it might be broken on Ventura right now, I'll have to check it.

I'll try to get around to testing Apple Music on these devices and see how it behaves.

WinkelCode commented 1 year ago

Update: This the last thing I'm checking today (for real this time 😆 ). When Airplaying to my MacBook (which works!), it was initially desynced in the same way (lyrics too slow) but over time it got back in sync, based on the fact that play/pause is instant, it's the delay on the iPad that's being adjusted!

When connecting again it's still in sync, I suspect that it may be saving an offset in a similar manner as the passwords for protected Airplay servers.

fduncanh commented 1 year ago

It would be great If you write a summary of which features you think would be useful to add to uxplay

e.g.options for added some sort(s) of delays, or just make "queue" element in pipeline(s) user-accessible for adding extra options etc.

WinkelCode commented 1 year ago

It would be great If you write a summary of which features you think would be useful to add to uxplay

e.g.options for added some sort(s) of delays, or just make "queue" element in pipeline(s) user-accessible for adding extra options etc.

I think that an option to adjust the delay for the video/audio separately would be a good idea. Maybe -ad <delay in ms> and -vd <delay in ms> . I am not jet sure if there are other options for the Gplayer queue that might be useful.

I've been interested getting AirPlay working without an Apple TV for a while a now and I've also been unhappy about the landscape of commercial software. So while I have to get other stuff done for now, I'll definitely have to get familiar with UxPlay and the AirPlay spec. There's a bunch of stuff that I think could be improved upon:

Better Windows compatibility, run without MSYS2 via static linking or bundling required DLL files? So far I have a working proof-of-concept where I used Procmon and wrote a script to copy all files UXplay opened, but that's really just a workaround and doesn't consider licensing.
Bug fixes:
- If the default audio format isn't set to 44100Hz, audio playback has popping like an old record.
- ~Video performance could be better, weird stutters/bad frame timing (approx. first <30 secs or so?), compression artifacts (might be normal?)~EDIT fixed by default fps = 60(?) *~I wonder if bad frame timing issues could be related to, as previously discovered, the NTP data being apparently ignored? Maybe the latency variation gets naturally better the longer it is connected?~ EDIT fixed by default fps =(60) ?
- ~Occasional, repeated crashing when trying to start screen mirroring (of course I can't intentionally reproduce it right now 🙄, but it's different to #170)~ EDIT: DONE
- ~Weird audio artifacting when changing volume in screen mirror mode~. EDIT DONE
- ~Fix transmission of audio delay like macOS's new built in AirPlay server can~. EDIT: WORKAROUND ADDED
- ~Fix text encoding of metadata output (non-standard characters are garbled)~ EDIT: DONE
New features:
- Password protection and manual connection approval mode (like macOS).
- UI for the program (ideally cross-platform).
- Integrate upscaling/sharpening filters into video output (FSR, etc.).
- Multimedia controls on the server's side and match server's system volume.
- Submit UxPlay to Flathub (#9).

(Note: For now I only thoroughly tried UxPlay on Windows, but I know that the video performance issues are on Arch as well.)

I want to be clear that UxPlay is already awesome right now, I dreamed about a FOSS program like this a few years ago! I wrote up all the things that could be done to make UxPlay better. My intention is to, when I get the time, set up a proper Linux development environment again and see what I can do to contribute to the program.

Edit: Also, -fps 60 should probably be the default option, unless there are some incompatibilities that I'm not aware of.

Edit 2: Also, when building with Bonjour, it would be useful to not have to install the Bonjour SDK (I extract the newest Bonjour version from the iTunes installer .exe which doesn't seem to come with the SDK?). I fixed this by getting the Bonjour SDK files from an alternative source (link in synergy-core's CI workflow) and putting the SDK files in the lib/ directory (rough workaround).

    if ( WIN32 )
      set(BONJOUR_SDK "${PROJECT_SOURCE_DIR}/lib/BonjourSDK" )

Again, I am not trying to just dump all my feature/bug fix requests in here and tell you to fix them, it's also for me as a to-do list, so that when I hopefully get around to getting into developing the project I know what could be worked on.

Edit 3: Also, if you're testing the Apple Music lyrics feature: "CUFF IT" by Beyoncé uses pretty much all the features of the live lyrics that I am aware of. For testing the audio popping, I found the first beats from "Anti-Hero" by Taylor Swift useful.

fduncanh commented 1 year ago

You say If the default audio format isn't set to 44100Hz, audio playback has popping like an old record.

can you explain this more? uxplay does only 44100 Hz, but gstreamer resamples to 48Khz for windows wasapisink

WinkelCode commented 1 year ago

You say If the default audio format isn't set to 44100Hz, audio playback has popping like an old record.

can you explain this more? uxplay does only 44100 Hz, but gstreamer resamples to 48Khz for windows wasapisink

I should do that, however, it appears that for some reason the resampling isn't happening correctly. But that particular issue might very well be a Gstreamer bug, no idea.

fduncanh commented 1 year ago

Fix text encoding of metadata output (non-standard characters are garbled)

what range of characters are at issue?

WinkelCode commented 1 year ago

Fix text encoding of metadata output (non-standard characters are garbled)

what range of characters are at issue?

For example "Beyoncé" will have a garbled "é" (Beyonc├⌐), umlaute (äöü) also don't work properly. I actually looked a little into that issue, it appears that getting C/C++ programs to print out "special" characters on Windows is a huge pain. The least "weirdly broken" way I've found is using a library like the FMT library (https://github.com/fmtlib/fmt), but I only tried that in a test program, and it also has its limits (UTF-16, I believe), so 🚂 won't print properly either. Rust's built-in print function works without any issues out of the box for this, including the train emoji (so it is technically possible to get working).

Two things that sound promising, but I haven't tried them: First one is using the Windows API to print out text to console, the other one is when using the MSVC compiler, to use /utf-8 when compiling.

Anything regarding codepages or just using wide character strings, at least for me, doesn't work.

fduncanh commented 1 year ago

From the way Beyoncé prints, is the metadata using UTF-16? what character set is "choo-choo-train" ?

utf-8 preferred!

fduncanh commented 1 year ago

@WinkelCode The UTF-8 problem (accented characters broken in Windows Terminal) is now fixed (at least for modern Windows 10 and 11)) please test and confirm this works for you.

WinkelCode commented 1 year ago

@WinkelCode The UTF-8 problem (accented characters broken in Windows Terminal) is now fixed (at least for modern Windows 10 and 11)) please test and confirm this works for you.

Yep, seems to work fine now, including the train emoji! During my experiments I had intentionally avoided the Windows APi, but it seems to be a clean enough solution in retrospect.

I also managed to reproduce the crash I mentioned before, it happens sometimes when trying to start a screen mirror:

Client identified as User-Agent: AirPlay/670.6.2
Accepted IPv4 client on socket 9840
Local: 192.168.0.25
Remote: 192.168.0.44
raop_rtp_mirror starting mirroring
Assertion failed: str, file C:/msys64/home/Workstation/UxPlay/lib/utils.c, line 188

Edit: It happens somewhat randomly (had it happen a few times in a row before), that's why I don't have debug logs for this unfortunately, I'll try to get it with the full logs.

Also, I took the opportunity to verify that the audio resampling issue is also present when screen mirroring.

WinkelCode commented 1 year ago

Here are more detailed logs of the crash, I think it's the same one, I don't see the "Assertion failed" string, ~~but that might only show up in non-debug log mode?~~

Crash 1

``` Handling request SETUP with URL rtsp://192.168.0.25/[REDACTED] DACP-ID: [REDACTED] Active-Remote: [REDACTED] Transport: null type = 110 streamConnectionID (needed for AES-CTR video decryption key and iv): [REDACTED] raop_rtp_mirror starting mirroring raop_rtp_mirror local data port socket 9944 port TCP 59296 Mirroring initialized successfully RTSP/1.0 200 OK CSeq: 10 Server: AirTunes/220.68 Content-Type: application/x-apple-binary-plist Content-Length: 85 streams dataPort 59296 type 110 raop_rtp_mirror accepting client httpd receiving on socket 9896 conn_request SET_PARAMETER rtsp://192.168.0.25/[REDACTED] RTSP/1.0 Content-Length: 20 Content-Type: text/parameters CSeq: 11 DACP-ID: DAC2A9147AA772E4 Active-Remote: 69060649 User-Agent: AirPlay/670.6.2 volume: -20.000000 Handling request SET_PARAMETER with URL rtsp://192.168.0.25/[REDACTED] RTSP/1.0 200 OK CSeq: 11 Server: AirTunes/220.68 raop_rtp_mirror: unidentified extra header data 0.000000, 0.000000 begin video stream wxh = 0x0; source 0x0 raop_rtp_mirror width_source = 0.000000 height_source = 0.000000 width = 0.000000 height = 0.000000 raop_rtp_mirror: sps/pps header size = 6 raop_rtp_mirror h264 sps/pps header: 00 00 00 00 00 00 raop_rtp_mirror sps size = 0 raop_rtp_mirror h264 Sequence Parameter Set: Segmentation fault ```

Crash 2

``` Handling request SETUP with URL rtsp://192.168.0.25/[REDACTED] DACP-ID: [REDACTED] Active-Remote: [REDACTED] Transport: null type = 110 streamConnectionID (needed for AES-CTR video decryption key and iv): [REDACTED] raop_rtp_mirror starting mirroring raop_rtp_mirror local data port socket 1320 port TCP 53077 Mirroring initialized successfully RTSP/1.0 200 OK CSeq: 11 Server: AirTunes/220.68 Content-Type: application/x-apple-binary-plist Content-Length: 85 streams dataPort 53077 type 110 httpd receiving on socket 6320 conn_request SET_PARAMETER rtsp://192.168.0.25/[REDACTED] RTSP/1.0 Content-Length: 20 Content-Type: text/parameters CSeq: 12 DACP-ID: 91C5FC75733281D3 Active-Remote: 536446079 User-Agent: AirPlay/670.6.2 volume: -20.000000 raop_rtp_mirror accepting client Handling request SET_PARAMETER with URL rtsp://192.168.0.25/[REDACTED] RTSP/1.0 200 OK CSeq: 12 Server: AirTunes/220.68 raop_rtp_mirror: unidentified extra header data 0.000000, 0.000000 begin video stream wxh = 0x0; source 0x0 raop_rtp_mirror width_source = 0.000000 height_source = 0.000000 width = 0.000000 height = 0.000000 Segmentation fault ```

Note: I redacted some unique-seeming data just in case, though it's probably not actually sensitive info. I can un-redact it if it could help. I can also post more of the log, here it is from Handling request SETUP with URL... to the crash.

I suspect a race condition might be the fault? The order in which log entities around the point of failure appear seems to differ between the runs.

Edit: Although, here it obviously crashes with a segmentation fault... I got this particular crash two times within a short timespan, and without debug mode, I only ever got the "assertion failed" crash. How odd...

fduncanh commented 1 year ago

No obvious bad-values reason for utils,c line 188 assert should fail in : utils_data_to_string(..) so calloc() may have failed because of some other failure???

WinkelCode commented 1 year ago

No obvious bad-values reason for utils,c line 188 assert should fail in : utils_data_to_string(..) so calloc() may have failed because of some other failure???

I wonder if it could be related to this?

raop_rtp_mirror: unidentified extra header data  0.000000, 0.000000
begin video stream wxh = 0x0; source 0x0
raop_rtp_mirror width_source = 0.000000 height_source = 0.000000 width = 0.000000 height = 0.000000

vs. (when not crashing)

raop_rtp_mirror: unidentified extra header data  240.000000, 0.000000
begin video stream wxh = 1440x1080; source 1440x1080
RAOP initialized success
raop_rtp start_time = 1673565683.359545 (raop_rtp audio)
raop_rtp_mirror width_source = 1440.000000 height_source = 1080.000000 width = 1440.000000 height = 1080.000000

Question is why is UxPlay not picking up this information when it crashes.

fduncanh commented 1 year ago

these are very unexpected values (width = 0 height = 0) correct values are shown below

were you using video?

`` raop_rtp_mirror: unidentified extra header data 240.000000, 0.000000 begin video stream wxh = 1440x1080; source 1440x1080 raop_rtp_mirror width_source = 1440.000000 height_source = 1080.000000 width = 1440.000000 height = 1080.000000 raop_rtp_mirror: sps/pps header size = 6 raop_rtp_mirror h264 sps/pps header: 01 64 00 28 ff e1

raop_rtp_mirror: unidentified extra header data 0.000000, 0.000000 begin video stream wxh = 0x0; source 0x0 raop_rtp_mirror width_source = 0.000000 height_source = 0.000000 width = 0.000000 height = 0.000000 Segmentation fault

WinkelCode commented 1 year ago

were you using video?

Yes, I was trying to start a screen mirror.

Audio-only hasn't crashed on me so far.

fduncanh commented 1 year ago

@WinkelCode

I will add some extra code to intercept these bad values and print out the packet data.

I will let you know when the github is updated and you can see if you can get crash data.

fduncanh commented 1 year ago

@WinkelCode

Re: the delay you see in lyrics., initially it seems to be one line behind the song (?), maybe catches up later.

fduncanh commented 1 year ago

@WinkelCode

The crash you reported is fixed in latest github code. The issue was occasional initial video packets without a payload, that should have been dropped (hence strange values like width=0, etc., since no data was there)

to update with git pull you may need to do git pull --rebase because I squashed various commits into a single one.

WinkelCode commented 1 year ago

@WinkelCode

Re: the delay you see in lyrics., initially it seems to be one line behind the song (?), maybe catches up later.

To me it doesn't seem like it's one line specifically, it's a fixed time thing, setting the delay to 5 seconds (for me) makes it in in sync. It doesn't appear to get more accurate with time, though it should? When I streamed to my MacBook Pro (built in AirPlay server) it looked like it was unsynced at first, then got more in sync, it was within the span of a song, so I'd know if it worked like this with UxPlay.

@WinkelCode

The crash you reported is fixed in latest github code. The issue was occasional initial video packets without a payload, that should have been dropped (hence strange values like width=0, etc., since no data was there)

to update with git pull you may need to do git pull --rebase because I squashed various commits into a single one.

So far I haven't seen it to crash with these changes 👍

fduncanh commented 1 year ago

@WinkelCode

I added (so far undocumented) option -ad n with (n >=0) to add a n millisec delay to ALAC audio streams only. (using queue min-threshold=(delay in ns))

you are right, adding -ad 5000 and using audio-only streaming not just gets lyrics showing on ipad right in time with sound on server, it also (approximately) syncs youtube video on iPad of someone speeking with sound playing on server.

Perhaps a value close to 5000 ms should be the default for ALAC streams? (-as 0 is accepted to remove delay)

The -ad argument is fully parsed to reject invalid values (no upper limit, should one be set?) please test/experiment

thiccaxe commented 1 year ago

Where might 5s come from? I am going to test on raspberry pi hardware. Perhaps it has more delay?

fduncanh commented 1 year ago

I just tested with -ad 5000 on R Pi, it seems the same! perhaps this is exactly a 5 sec delay for ALAC?

EDIT It's a limited-use case, watching the iPad screen with audio on the server, of course.

fduncanh commented 1 year ago

@WinkelCode

I dont understand the issue with your proposed queue min-theshold=... for video?

I hesitate to do anything to mess with video/audio syncing for mirror mode. (audio-only ALAC mode is OK for adding audio delay adjustment for better sync with video on the client.)

I havent yet understood queue min-threshold= the GStreamer doc seems to imply it just holds the initial audio in the ring buffer until the delay expires. I did some work in the past on the audio ring buffer to use the original rtp time with conversion to ntp time on exit (RPiPlay author had converted the buffer to hold ntp unix time), and converted rtp time from 32 to 64 bit when it starts to avoid the theoretical issue of glitches at the 32 bit epoch (which is 27 hours at 44100 Hz), given that the start time is random during the epoch, and could be near the epoch end so an epoch passes during actual playing time.

The code from RPiPlay had random latency just after audio starts, since there is no sync between audio 44100Hz rtptime and ntptime (used by video) until a few secs after starting the stream when the first ntp sync signal is received from the client. I use a guessed initial audio latency of 2.0sec for ALAC, 0.5sec for AAC-ELD. This might be a legacy mode issue because apple built mirroring on top of itunes to begin with. The ntp stuff is dropped in the AirPlay2 REMOTE_CONTROL protocol that UxPlay does not support (see issue #153 )

(lib/raop_rtp.c line 630)

                    if (!offset_estimate_initialized) {
                        offset_estimate_initialized = true;
                        switch (raop_rtp->ct) {
                        case 0x02:  
                            delay = DELAY_ALAC;   /* DELAY = 2000000 (2.0 sec) is empirical choice for ALAC */
                            logger_log(raop_rtp->logger, LOGGER_DEBUG, "Audio is ALAC: using initial latency estimate -%8.6f sec",
                                      ((double) delay) / SEC);
                            break;
                        case 0x08:
                            delay = DELAY_AAC;   /* DELAY = 500000 (0.5 sec) is empirical choice for AAC-ELD */
                            logger_log(raop_rtp->logger, LOGGER_DEBUG, "Audio is AAC: using initial latency estimate -%8.6f sec",
                                       ((double) delay ) / SEC);
                            break;
                        default:
                            break;
                        }

The 5 sec thing in ALAC might be something I introduced, since ALAC mode had been stripped out of the original Shairplay code (and replaced by AAC_ELD) by the author of AirServer, which was then converted to RPiPlay. I later reintroduced ALAC mode as an enhancement to UxPlay. (perhaps some archeological digging in shairplay code might be instructive, but there was no video in shairplay so audio delay would not have been anything to care about there I guess.)

WinkelCode commented 1 year ago

I think there might be a bit of confusion about the issues I originally brought up in the thread, they are mostly independent of each other.

1. The artificial delay on lyrics and YouTube videos in audio-only mode.

Note: I can confirm this is an issue with YouTube videos as well, this might be an easier place to test than Apple Music lyrics.

It appears that some part of the AirPlay system is a capability to detect the approximate latency between the client sending the audio and the server playing back that audio.
This functionality works properly when streaming to a macOS device, where newer versions of the operating system come with an AirPlay Server. Presumably it also works with a real Apple TV.
Presumably if the AirPlay client isn't able to determine that latency with a UxPlay server, it will introduce a default delay of 5 seconds to the corresponding video/lyrics output on the client.
~Shairplay4w~ Shairport4w (Windows version of ~Shairplay~ Shairport) also appears to suffer from the problem. Not sure about Linux ~Shairplay~ Shairport.
Introducing a 5 second delay on the server is a bad workaround, since it will make things like play, pause, and switching tracks also delayed by 5 seconds.
2. Optionally introducing a buffer on video output on the server.
I find that screen mirroring performance is lacking due to iffy frame timing/micro-stutters.
I believe that this can be improved by introducing a buffer of a couple frames.
It appears that min-threshold-time just blindly delays any frame rather than using a buffer to bridge missing frames.
There are more options for the queue that I haven't gotten around to trying,

I hesitate to do anything to mess with video/audio syncing for mirror mode.
My proposal was to introduce any additional buffering to video and audio simultaneously, since as far as I can tell, audio/video sync isn't problematic in mirror mode.

I think it may be worthwhile to investigate how the AirPlay server on macOS works, for example it even shows up with a laptop icon (probably different for other Macs) in the AirPlay list. I've been meaning to check the exact differences between what the macOS AirPlay server is sending and what UxPlay is sending, somewhere in there should the information about the proper video (client) and audio (server) delay.

fduncanh commented 1 year ago

ShairPlay and shairport (and shairport4w) are two quite different code bases for iTunes AirPlay. Shairport derives historically from the original "crack" of apple's public key from hardware inside an airport device.

UxPlay inherits from the ShairPlay Code, via AirServer and RPiPlay.

The modern derivative of shairport is shairport-sync. This has made inroads into AirPlay2 protocol, but only for audio, to have multi-room audio systems all properly synced, surround-sound etc.

fduncanh commented 1 year ago

Introducing a 5 second delay on the server is a bad workaround, since it will make things like play, pause, and switching tracks also delayed by 5 seconds.

What is your opinion of the -ad option (left undocumented so far, so I can remove it if that is best). -ad <n> applies @WinkelCode 's delay of n msec to ALAC only. Does if mess up play, pause, track-switching , etc?

I understand from the comment that it would be a bad idea to have a 5s default delay, but leave it at 0s, with an option for the user to impose the delay?

(Too many options may be a confusing aspect of UxPlay)

fduncanh commented 1 year ago

My guess is that an iPad communicating with a modern Macbook pro with airplay support will be using the AirPlay 2 REMOTE_CONTROL protocol (this might be detectable with wireshark) which does not use a dedicated ntp timing port.

EDIT: REMOTE_CONTROL seems to be used by modern Macs in streaming mode, not mirror mode.

In legacy ntp protocol, after the initial handshake with the client (where plist data is exchanged), the server sends a request for an ntp time signal to the client once every 3 secs. All that this contains is the server's ntp time at time of sending. The client replies with both the server's time and its own time when it received the query. The server must receive the reply within 0.3sec of when it sent the request, and records its own time when it received the reply. The server assumes that the travel times to and from the client are equal, to estimate offset between its own ntp time and client's ntp time. and also the time delay for the signal to travel from client to server, for latency issues. These are averaged over the last 8 time signals, so it takes 24 sec after streaming starts for a steady set of data to be gathered, for best accuracy estimating roundtrip travel times.

EDIT: the client sends back both its time when it received the query, and its time when it sent the reply. The client is also using this "heartbeat" signal to know that the server is still listening to it. The time between them is subtracted from the round-trip time.

These signals are used to gradually improve and maintain the clock syncing of the server's clock with the client's clock using ntp protocol . I don't think the client does any adjustments of its own clock to match the servers's clock.

with this (legacy) protocol there is no obvious way for the client to understand about latency, only the server gets to know about round-trip times. The newer protocol REMOTE_CONTROL #153 with no ntp signalling may allow the client to know about latency.


raop_ntp send_len = 32, now = 1673727348714657
raop_ntp receive time type_t=83 packetlen = 32
80 d3 00 07 00 00 00 00 e7 6d 8b f4 b6 f3 c2 da 
83 ab a6 d7 1b 1c 3c 2f 83 ab a6 d7 1b 21 e0 6c 

raop_ntp sync correction = 0

Below is the syncing signal where the client periodically sends both its ntp clock time and its rtp clock time for audio based on audio frames at 44100 HZ. This allows video and audio times to be kept in sync. (video frames are timestamped with client ntp time, audio frames have client rtp time based on 44100 HZ). I don't understand why this isn't sent immediately when audio starts. It only first happens after about 1 sec or so of audio, which is why I needed to use an empirical (guessed) latency before this.

raop_rtp type_c 0x54, packetlen = 20
raop_rtp sync: client ntp=75863.174369, ntp = 1673727348.785052, ntp_start_time 1673727330.792046, sync_rtp=4215348436
80 d4 00 04 fb 41 1c d4 83 ab a6 d7 2c a3 7e f5 fb 42 4a 4b 

raop_rtp sync correction=1, rtp_sync_offset = -1.799773

thiccaxe commented 1 year ago

In audio only mode, even with -ad 5000, there doesn't seem to be much more delay than on an apple tv (3,1) in audio only mode when playing/pausing the audio

thiccaxe commented 1 year ago

I found that -ad 3000 seems to work well with these two sync videos: https://www.youtube.com/watch?v=s_PbyRpKrRk and https://www.youtube.com/watch?v=xKYd4vFwdA8

fduncanh commented 1 year ago

Since UxPlay seems to use a legacy protocol that cannot tell the client to adjust the timing of the audio, I guess that -ad is a useful addition. If there is no universal value that works well in all situations, I guess it should be 0 by default? Lets get a bit more testing.

is the "name" -ad OK or too cryptic?

I haven't yet had time to look into the video issue that @WinkelCode has raised.

fduncanh commented 1 year ago

@WinkelCode

Re: video buffer:

You are right, there does not seem to be a video buffer: there is a file lib/mirror_buffer.c that handles video decryption that seems to have been created as the video analog to audio decryption in lib/raop_buffer.c which does has a ring buffer for audio.

However I cannot see any similar buffer for video in "mirror_buffer.c", just the decryption. So I think the author of AirServer who imitated the audio structures found in ShairPlay to add video, didn't imitate a buffer structure for video, just the decryption was imitated. so "mirror-buffer.c" is a misleading file name!

EDIT: the strategy then would be to use GStreamer's "queue" to buffer the decrypted video before rendering, without messing with RAOP code in /lib.

WinkelCode commented 1 year ago

is the "name" -ad OK or too cryptic?

We have -vd decoder chooses the GStreamer pipeline's h264 decoder element,. -ad(elay) makes sense, but would conflict with -vd in the naming pattern. -al(atency) doesn't seem accurate enough, maybe -ao(ffset)?

FDH2 / UxPlay

(Windows issues, some now fixed): Is it possible to set a buffer/delay? #169

I did a small test where I took this part of `raop_handlers.h`

About Shairport4w

More detailed problem description

1. The artificial delay on lyrics and YouTube videos in audio-only mode.

2. Optionally introducing a buffer on video output on the server.

FDH2 / UxPlay

(Windows issues, some now fixed): Is it possible to set a buffer/delay? #169

I did a small test where I took this part of raop_handlers.h

About Shairport4w

More detailed problem description

1. The artificial delay on lyrics and YouTube videos in audio-only mode.

2. Optionally introducing a buffer on video output on the server.

I did a small test where I took this part of `raop_handlers.h`