LeonardoCardoso / SwiftLinkPreview

It makes a preview from an URL, grabbing all the information such as title, relevant texts and images.
https://leocardz.com/swift-link-preview-5a9860c7756f
MIT License
1.37k stars 198 forks source link

[Proposition] Switch to `HTMLString` to decode HTML character entities #102

Closed chriszielinski closed 4 years ago

chriszielinski commented 5 years ago

This PR is a proposition. May I recommend removing the reliance on NSAttributedString to decode HTML character entities in favor of the HTMLString library.

Why?

  1. Performance: NSAttributedString requires a large overhead (e.g. WebKit) for such a relatively small purpose. Here is a performance comparison of decoding HTMLString's TestData using HTMLString & NSAttributedString, respectively.

    performance

    The performance tests ran are:

    📌 Note: HTMLTestLongUnescapableString is in TestData.swift.

    override class var defaultPerformanceMetrics: [XCTPerformanceMetric] {
        return [.wallClockTime,
                XCTPerformanceMetric(rawValue: "com.apple.XCTPerformanceMetric_TemporaryHeapAllocationsKilobytes")
        ]
    }
    
    func testNSAttributedString() {
        measure {
            let encodedData = HTMLTestLongUnescapableString.data(using: .utf8)!
            let attributedOptions: [NSAttributedString.DocumentReadingOptionKey: Any] = [
                .documentType: NSAttributedString.DocumentType.html,
                .characterEncoding: NSNumber(value: String.Encoding.utf8.rawValue)
            ]
    
            do {
                let attributedString = try NSAttributedString(data: encodedData,
                                                              options: attributedOptions,
                                                              documentAttributes: nil)
                _ = attributedString.string
            } catch {
                XCTFail(error.localizedDescription)
            }
        }
    }
    
    func testHTMLString() {
        measure {
            _ = HTMLTestLongUnescapableString.removingHTMLEntities
        }
    }
  2. Concurrency: Using NSAttributedString requires the main thread. Not a problem for 99.9% of use cases; however, it becomes a problem when synchronous behavior is wanted (e.g. a command line interface). I'm sure there are some obscure ways around this, but given the performance implications themselves, why not kill two birds with one stone.

That being said, this PR isn't ready yet. The changelog hasn't been updated nor has the SPM been tested—I'm sure there's other things as well. Just wanted to test the water first.

ghost commented 5 years ago
1 Error
:no_entry_sign: Please include a CHANGELOG entry.
You can find it at CHANGELOG.md.

Generated by :no_entry_sign: Danger

LeonardoCardoso commented 5 years ago

Thanks, @chriszielinski! I will check asap.

LeonardoCardoso commented 5 years ago

Hi, the beauty of the lib is not using any other external lib. So I will keep this PR open until I have more time to improve the crawling. Thanks.