ShreyanJain9 / bskyrb

Ruby Gem for interacting with BlueSky/AT Protocol
MIT License
53 stars 8 forks source link

Mentions in posts #3

Closed ShreyanJain9 closed 1 year ago

ShreyanJain9 commented 1 year ago

Just leaving this up here. I'll probably try to figure this out tomorrow.

arjenbrandenburgh commented 1 year ago

Dumping some stuff here. You probably are already aware of this, but maybe it can help you a bit. To make links and mentions work you need to add facets to the record.

facets = detect_facets(text)
input = Bskyrb::ComAtprotoRepoCreaterecord::CreateRecord::Input.from_hash({
  "collection" => "app.bsky.feed.post",
  "$type" => "app.bsky.feed.post",
  "repo" => session.did,
  "record" => {
    "$type" => "app.bsky.feed.post",
    "createdAt" => DateTime.now.iso8601(3),
    "text" => text,
    "facets" => facets
  }
})

When looking at the official @atproto/api package you can find such a function here: https://github.com/bluesky-social/atproto/blob/e7a0d27f1fef15d68a04be81cec449bfe3b1011f/packages/api/src/rich-text/detection.ts#L7

Here is a quick and dirty translation from Typescript to Ruby using ChatGPT. The regular expressions don't seem to be working well, so that needs some work. There are probably better solutions, but am currently lacking the time to help out on that.

def detect_facets(text)
  facets = []
  utf8_text = text.encode('UTF-8')

  # mentions
  re = /(^|\s|\()(@)([a-zA-Z0-9.-]+)(\b)/
  text.scan(re) do |match|
    mention = match.join("")
    next if !valid_domain?(mention) && !mention.end_with?('.test')

    start = utf8_text.index(mention) - 1
    facets.push({
      '$type' => 'app.bsky.richtext.facet',
      'index' => {
        'byteStart' => start,
        'byteEnd' => start + mention.length + 1,
      },
      'features' => [
        {
          '$type' => 'app.bsky.richtext.facet#mention',
          'did' => mention
        },
      ],
    })
  end

  # links
  re = /(^|\s|\()((https?:\/\/[\S]+)|((?<domain>[a-z][a-z0-9]*(\.[a-z0-9]+)+)[\S]*))/i
  text.scan(re) do |match|
    uri = match[2]
    if !uri.start_with?('http')
      domain = match[4] # this assumes the 'domain' group is the fifth match
      next if !domain || !valid_domain?(domain)
      uri = "https://#{uri}"
    end
    start = utf8_text.index(match[2], match.begin(0))
    index = { 'start' => start, 'end' => start + match[2].length }
    # strip ending punctuation
    if uri.match(/[.,;!?]$/)
      uri = uri[0..-2]
      index['end'] -= 1
    end
    if uri.match(/[)]$/) && !uri.include?('(')
      uri = uri[0..-2]
      index['end'] -= 1
    end
    facets.push({
      'index' => {
        'byteStart' => index['start'],
        'byteEnd' => index['end'],
      },
      'features' => [
        {
          '$type' => 'app.bsky.richtext.facet#link',
          'uri' => uri,
        },
      ],
    })
  end

  facets.empty? ? nil : facets
end

def valid_domain?(str)
  tlds = ['com', 'org', 'net', 'io', 'gov', 'edu'] # Define your TLDs here
  tlds.any? do |tld|
    i = str.rindex(tld)
    i != -1 && str[i - 1] == '.' && i == str.length - tld.length
  end
end

Edit: here is a quick untested attempt to clean the above up a bit. Disclaimer: I haven't really tested it fully. The mention seems to work ok. The link matcher still has some issues due to the URI.regexp filtering out slashes somehow.

require 'uri'

def create_facets(text)
  facets = []

  # Regex patterns
  mention_pattern = /(^|\s|\()(@)([a-zA-Z0-9.-]+)(\b)/
  link_pattern = URI.regexp

  # Find mentions
  text.enum_for(:scan, mention_pattern).each do |m|
    index_start = Regexp.last_match.offset(0).first
    index_end = Regexp.last_match.offset(0).last - 1
    facets.push(
      '$type' => 'app.bsky.richtext.facet',
      'index' => {
        'byteStart' => index_start,
        'byteEnd' => index_end,
      },
      'features' => [
        {
          '$type' => 'app.bsky.richtext.facet#mention',
          'did' => m.join("").strip # this is the matched mention
        },
      ],
    )
  end

  # Find links
  text.enum_for(:scan, link_pattern).each do |m|
    index_start = Regexp.last_match.offset(0).first
    index_end = Regexp.last_match.offset(0).last - 1
    facets.push(
      '$type' => 'app.bsky.richtext.facet',
      'index' => {
        'byteStart' => index_start,
        'byteEnd' => index_end,
      },
      'features' => [
        {
          '$type' => 'app.bsky.richtext.facet#link',
          'url' => m.join("").strip # this is the matched link
        },
      ],
    )
  end

  facets.empty? ? nil : facets
end
ShreyanJain9 commented 1 year ago

Nice! Thank you 🙂

Yeah, it's mostly the facet detection that's been giving me issues. I'll adapt what you've written, most likely 🙂

ShreyanJain9 commented 1 year ago

Thanks for this 🙂 Your thing helped me get to a final solution!