Change to youtube-chat NPM module to get live messages

codingMASTER398 commented 1 year ago

I've tried to make a stream read the live chat myself once before, and how I did it was with an old, but extremely handy NPM module, youtube-chat. It's able to reliably read incoming chat messages in the same way that the current system does, but much cleaner and created by a couple smarter folks that weren't on crack.

The stream I used it on worked for ~5 months without missing a single chat message using this. It doesn't need any API keys, either. I honestly have no clue how it works but seeing as it stumbled its way onwards through 5 months & reliably picks up every message from high-activity streams I've demoed it on, it would be safe to assume that it's a whole lot better than cobbled together puppeteer code written by Code Bullet at 3AM.

This PR does exactly what it says, and replaces any and all puppeteer bloatware with an easier to understand module that just does everything for us. It's in the same format as the Unity stream expects, too.

Handy dandy demo showing it working on Alan Becker's 24/7 stream

steven4547466 commented 1 year ago

This looks good and has the ability to send the chatter's ID as well in the future for moderation purposes it seems. I was unaware this existed when I made my original change.

However, I will say, this does use YouTube's private API, which can change with no warning and no documentation, and that the innertube api key can and will expire eventually which means the process will need to be restarted to scrape a new one, but that's not too big of a deal.

Regardless, at this point in time I have tested this and it does work, though I will suggest a minor change, now that it's possible.

Change getAuthorAndContents to

async function getAuthorAndContents(message) {
  if (!message) return null
  let id = message.author.channelId
  let author = message.author.name || "User"
  let text = message.message[0].text || "Not Found"
  return { id, author, text }
}

and add id to the C# side by editing the ChatMessage class:

public class ChatMessage
{
    public string id { get; set; }
    public string author { get; set; }
    public string text { get; set; }

    public ChatMessage(string id, string author, string text)
    {
        this.author = author;
        this.text = text;
        this.id = id;
    }

    public ChatMessage() { }

    public override string ToString()
    {
        return $"Author: {author} ({id}) | Message: {text}";
    }
}

while not used now, this allows for moderation tools in the future.

While you're in there, you may as well drop the need for it to be an array as well, which on the js side is simply changing the body to JSON.stringify(authorAndContents), the C# side would need to look like this:

while (listener.IsListening)
{
    HttpListenerContext ctx = listener.GetContext();

    HttpListenerRequest req = ctx.Request;
    HttpListenerResponse resp = ctx.Response;

    ChatMessage chatMessage = JsonConvert.DeserializeObject<ChatMessage>(new StreamReader(req.InputStream).ReadToEnd());

    string message = chatMessage.text;
    string author = chatMessage.author;

    if (string.IsNullOrWhiteSpace(message))
    {
        continue;
    }

    Debug.Log("recieved: " + chatMessage);

    if (message.ToLower().StartsWith("topic:"))
    {

        string topicMessage = message.Substring("topic:".Length).Trim();

        // Check if the topicMessage contains any word from the wordBlacklist
        bool containsBlacklistedWord = wordBlacklist.Any(blackWord => topicMessage.ToLower().Contains(blackWord.ToLower()));
        bool haveAlreadyDoneTopic = alreadyTakenTopics.Contains(topicMessage);
        // Check if the message after "topic:" is not empty or only spaces
        if (!string.IsNullOrWhiteSpace(topicMessage) && !haveAlreadyDoneTopic && !containsBlacklistedWord)
        {

            // Add new message text to message texts
            topicSuggestions.Add(topicMessage + "\n" + author);

            Debug.Log(topicMessage + "\n" + author);

            // Limit the size of messageTexts
            if (topicSuggestions.Count > maxListSize)
            {
                // Remove oldest message text
                topicSuggestions.RemoveAt(0);
            }

        }
    }
    else if (message.ToLower().StartsWith("vote:"))
    {

        string voteMessage = message.Substring("vote:".Length).Trim();

        // Check if the message after "topic:" is not empty or only spaces
        if (!string.IsNullOrWhiteSpace(voteMessage))
        {
            // Add new message text to message texts
            voteSuggestions.Add(voteMessage);
        }
    }

    resp.OutputStream.Write(responseBuffer, 0, responseBuffer.Length);
    resp.Close();
}

Da532 commented 1 year ago

Valiant effort but unnecessary. Node slop should be eliminated from this project ultimately. It's inefficient for pushing data at scale and bloated. My previous PR solved this.

Using a 3rd party abstracted API also isn't ideal as it may cease operation or dish out rate limits at any time, halting the entire project.

I second that getting the channel ID is useful though. May implement this myself in a similar fashion to YouTube-Livechat-GoToChannel.

codingMASTER398 commented 1 year ago

This looks good and has the ability to send the chatter's ID as well in the future for moderation purposes it seems. I was unaware this existed when I made my original change.

However, I will say, this does use YouTube's private API, which can change with no warning and no documentation, and that the innertube api key can and will expire eventually which means the process will need to be restarted to scrape a new one, but that's not too big of a deal.

Regardless, at this point in time I have tested this and it does work, though I will suggest a minor change, now that it's possible.

Change getAuthorAndContents to

async function getAuthorAndContents(message) {
  if (!message) return null
  let id = message.author.channelId
  let author = message.author.name || "User"
  let text = message.message[0].text || "Not Found"
  return { id, author, text }
}

and add id to the C# side by editing the ChatMessage class:

public class ChatMessage
{
    public string id { get; set; }
    public string author { get; set; }
    public string text { get; set; }

    public ChatMessage(string id, string author, string text)
    {
        this.author = author;
        this.text = text;
        this.id = id;
    }

    public ChatMessage() { }

    public override string ToString()
    {
        return $"Author: {author} ({id}) | Message: {text}";
    }
}

while not used now, this allows for moderation tools in the future.

While you're in there, you may as well drop the need for it to be an array as well, which on the js side is simply changing the body to JSON.stringify(authorAndContents), the C# side would need to look like this:

while (listener.IsListening)
{
    HttpListenerContext ctx = listener.GetContext();

    HttpListenerRequest req = ctx.Request;
    HttpListenerResponse resp = ctx.Response;

    ChatMessage chatMessage = JsonConvert.DeserializeObject<ChatMessage>(new StreamReader(req.InputStream).ReadToEnd());

    string message = chatMessage.text;
    string author = chatMessage.author;

    if (string.IsNullOrWhiteSpace(message))
    {
        continue;
    }

    Debug.Log("recieved: " + chatMessage);

    if (message.ToLower().StartsWith("topic:"))
    {

        string topicMessage = message.Substring("topic:".Length).Trim();

        // Check if the topicMessage contains any word from the wordBlacklist
        bool containsBlacklistedWord = wordBlacklist.Any(blackWord => topicMessage.ToLower().Contains(blackWord.ToLower()));
        bool haveAlreadyDoneTopic = alreadyTakenTopics.Contains(topicMessage);
        // Check if the message after "topic:" is not empty or only spaces
        if (!string.IsNullOrWhiteSpace(topicMessage) && !haveAlreadyDoneTopic && !containsBlacklistedWord)
        {

            // Add new message text to message texts
            topicSuggestions.Add(topicMessage + "\n" + author);

            Debug.Log(topicMessage + "\n" + author);

            // Limit the size of messageTexts
            if (topicSuggestions.Count > maxListSize)
            {
                // Remove oldest message text
                topicSuggestions.RemoveAt(0);
            }

        }
    }
    else if (message.ToLower().StartsWith("vote:"))
    {

        string voteMessage = message.Substring("vote:".Length).Trim();

        // Check if the message after "topic:" is not empty or only spaces
        if (!string.IsNullOrWhiteSpace(voteMessage))
        {
            // Add new message text to message texts
            voteSuggestions.Add(voteMessage);
        }
    }

    resp.OutputStream.Write(responseBuffer, 0, responseBuffer.Length);
    resp.Close();
}

Wow! That is certainly much better. I don't know how to use C# so I didn't bother, but if you could edit my PR to include those changes it would be wonderful.

I understand the concern of using a seperate, private API however the same is true with puppeteer. The private API has worked well for ~2 years after the package's last publish, so it's safe to say that any changes we won't have to worry about until much later. YouTube, like many other platforms, have legacy APIs that are kept on for often years.

codingMASTER398 commented 1 year ago

Valiant effort but unnecessary. Node slop should be eliminated from this project ultimately. It's inefficient for pushing data at scale and bloated. My previous PR solved this.

Using a 3rd party abstracted API also isn't ideal as it may cease operation or dish out rate limits at any time, halting the entire project.

I second that getting the channel ID is useful though. May implement this myself in a similar fashion to YouTube-Livechat-GoToChannel.

Your PR solves this by using the Go language to do the same thing- open a browser and scrape. Imo, a node package with 2 dependencies that is much less demanding is 10x better than whatever you're doing.

Remember that this is Rick and Morty AI, not a full on application that many people will be running and putting under constant load. This just isn't necessary.

My change replaces the (already could change at any time) scraper with an easy to understand module that uses APIs that have withstanded the test of time, at least for > 2 years.

Da532 commented 1 year ago

We discussed this via Discord, but the main issues I hold with this are that;

This approach relies upon yet another API, further adding to the call chain rather than directly scraping youtube.
My approach used Go as it is able to handle far more operations per second than node, meaning many more messages can be handled if this scales to something the size of AI Sponge, for example.

Take off the horse blinders for a sec mate. When you've got tunnel vision it looks like your solution is rad but this really isn't the best call imo. Comparing a third party API to something like puppeteer is not a fair comparison at all as we will always have working versions of those modules as it all does is interface with Chrome. We will always have that version of chrome also. Something we cannot control however is someone elses API.

Nothing will ever be better than scraping the chat itself aside from a 1st party api. This further complicates the chain and creates abstraction.

steven4547466 commented 1 year ago

Nothing will ever be better than scraping the chat itself aside from a 1st party api. This further complicates the chain and creates abstraction.

This uses youtube's own private api, it's not third party, it's reverse engineered.

This is the url it makes its request to: https://www.youtube.com/youtubei/v1/live_chat/get_live_chat. This is exactly how youtube shows its live chat to people: Just to clear up confusion, there is no third party at all during this, it's just the exact same way youtube itself shows people chat, which you can find out by popping out chat and looking at the network tab in dev tools.

Also everyone has the same key as far as I can tell, I blurred it out before I figured that out.

codingMASTER398 commented 1 year ago

Nothing will ever be better than scraping the chat itself aside from a 1st party api. This further complicates the chain and creates abstraction.

It's 1st party, just undocumented

steven4547466 commented 1 year ago

I would like to eventually port this to c# as this is just basic web requests, I'll make that my next project if this proves to work well. I read through most of the module when removing axios and I believe it should be incredibly easy to recreate in c#.

codingMASTER398 commented 1 year ago

me when stale PR

Code-Bullet / RickAndMortai

Change to youtube-chat NPM module to get live messages #4