LeonardoCardoso / SwiftLinkPreview

It makes a preview from an URL, grabbing all the information such as title, relevant texts and images.
https://leocardz.com/swift-link-preview-5a9860c7756f
MIT License
1.37k stars 198 forks source link

Attempting to preview certain links will leave the previewer Regex running and unable to cancel #113

Open pocketlim opened 4 years ago

pocketlim commented 4 years ago

Hey Leonardo Cardoso! Just wanted to say big thank you to a cool little kit of code making it really easy to get a preview of a URL! ❤️

Certain links, however, will cause the previewer to get down to the Regex.pregMatchAll() method and there it will churn pretty much indefinitely, tasking one of the cores to maximum. Because it's already beyond the request part, it's unable to be cancelled at that point. It will continue to task the CPU as you navigate around the app, but at least it stops when the app is backgrounded.

Example URL: https://news.ycombinator.com/item?id=22598009

Screen Shot 2020-03-16 at 8 54 43 PM

Would you have any suggestion on how to handle a case like this, or a way/means to cancel the churning after a specific amount of time has passed?

For now, I will try and collect bad URLs and just avoid sending them to the previewer

Here's a sample project to test with:

LinkPreviewHighCPUStuck-CancelAdded.zip

(Edit: Updated with a version that doesn't use a cache, and with cancel button) I know, it's crude, but it tests one working link preview and the bad link.

Xcode: 11.3.1 iOS targeted: 11.x, 12.x, 13.x Stack trace and time graph shown above

Let me know if you'd like any more help.

inPhilly commented 4 years ago

@pocketlim is this still an issue or have you found a way to fix it?

pocketlim commented 4 years ago

Unfortunately, no, it is still happening as far as I know. I am blacklisting the URLs that I found that aren't working or that cause issues so far.

LeonardoCardoso commented 3 years ago

Thanks for the detailed report. I will add it to investigation needed.

huanghui1998hhh commented 2 years ago

I'm facing this, it's still an issue.

adamwulf commented 2 months ago

I was able to work around this by:

  1. only parsing the first 30k characters, not the entire response
  2. adjusting the reject to not use .*
  3. only parsing text responses and ignoring binary responses

Regex.swift:74

- string.split(by: limit).forEach {
+ string.split(by: limit).first.map {

Regex.swift:108

- return "<" + tag + "(.*?)>(.*?)</" + tag + ">"
+ return "<" + tag + "([^>]*?)>([^<>]*?)</" + tag + ">"

SwiftLinkPreview.swift, import MobileCoreServices), and add the following at line 344 to skip parsing binary content

            guard
                let httpResponse = urlResponse as? HTTPURLResponse,
                let contentType = httpResponse.value(forHTTPHeaderField: "content-type") as? String,
                let parsedType = Regex.pregMatchFirst(contentType, regex: "([^/\\s;]*/[^/\\s;]*)"),
                let type = UTTypeCreatePreferredIdentifierForTag(kUTTagClassMIMEType, parsedType as CFString, nil)?.takeRetainedValue()
            else {
                onError(.cannotBeOpened("Unknown content type"))
                return
            }
            let strType = type as String
            guard
                UTTypeConformsTo(type, kUTTypeText)
            else {
                onError(.cannotBeOpened("Invalid content type: "))
                return
            }