deltachat / deltachat-ios

Email-based instant messaging for iOS.
Other
321 stars 51 forks source link

URLs that end on "." aren't detected correctly #2293

Closed cryptosteve2 closed 1 month ago

cryptosteve2 commented 2 months ago
zeitschlag commented 2 months ago

Thank you for that bug report, I can reproduce it and had a quick look at it, but haven't come up with a solution yet.


Note to self

// Get all Checking Types of detectors, except for .custom because they contain their own regex
let detectorCheckingTypes = enabledDetectors
    .filter { !$0.isCustom }
    .reduce(0) { $0 | $1.textCheckingType.rawValue }
if detectorCheckingTypes > 0, let detector = try? NSDataDetector(types: detectorCheckingTypes) {
    // doesn't match https://www.sportschau.de/fussball/championsleague/bayern-und-dortmund-spielen-gegen-barca,champions-league-auslosung-136.html#:~:text=Jeder%20der%2036%20Klubs%20-%20bislang%20waren%20es%2032%20-%20spielt%20viermal%20im%20eigenen%20Stadion%20und%20viermal%20ausw%C3%A4rts.%20Schwere%20Ausw%C3%A4rtsaufgaben%20erwischten%20unter%20anderem%20die%20Bayern%2C%20die%20in%20Barcelona%20auf%20ihren%20Ex-Trainer%20Flick%20treffen%2C%20sowie%20Leipzig%20und%20Dortmund%2C%20die%20nach%20Madrid%20reisen%20m%C3%BCssen.
    // for whatever reason.
    let detectorMatches = detector.matches(in: text.string, options: [], range: range)
    if detectorMatches.isEmpty == false {
        debugPrint(detectorMatches)
    }
    matches.append(contentsOf: detectorMatches)
}
Screenshot ![IMG_D27D01138222-1](https://github.com/user-attachments/assets/222a3f6e-efb6-48ee-8b0d-9a9d3d62ff33)
cryptosteve2 commented 2 months ago

Btw, maybe it's not deltaChat related? I have the same issue in MacOS Apple Mail ... 20240903@134719

r10s commented 1 month ago

we discussed internally about that issue, and came to the conclusion, that we cannot do reasonably much upon that. in theory, we could try to detect urls on our own, but we would probably open much more issues with that that closing this one. parsing text and detecting URLs is a hard job when it comes to cornercases. it is good to leave that up to apple.

for the concrete issue: the bug is that depending on the final dot, the URL is marked as such only "half".

for the final dot: if that belongs to the URL or not is not really detectable. if you have the text i like http://foo.bar/#hilight=baz.
the URL may or may not include the final dot. looking at URLs existing in the wild, however, it is reasonable to assume that URLs end less often with a dot than sentences - this is probably what apple assumes here - as well as github, btw, when looking at the initial post. but as said, we would leave that up to apple.