alexisakers / HTMLString

Escape and unescape HTML entities in Swift
MIT License
170 stars 69 forks source link

Performance Improvement #29

Closed JCSooHwanCho closed 4 years ago

JCSooHwanCho commented 4 years ago

Hi. @alexaubry

Recently, my company's app using this library got crashed only iOS 13+

0
libswiftCore.dylib
closure #1 in closure #1 in closure #1 in _assertionFailure(_:_:file:line:flags:) + 456
1
libswiftCore.dylib
_assertionFailure(_:_:file:line:flags:) + 472
2
libswiftCore.dylib
_fatalErrorMessage(_:_:file:line:flags:) + 44
3
libswiftCore.dylib
_scalarAlign(_:_:) + 986
4
libswiftCore.dylib
_stringCompareInternal(_:_:expecting:) + 612
5
HTMLString
$sSS10HTMLStringE18removeHTMLEntitiesyyF + 236
6
HTMLString
$sSS10HTMLStringE21addingUnicodeEntitiesSSvgTm + 40

Fatal error: file /Library/Caches/com.apple.xbs/Binaries/swiftlang/install/TempContent/Objects/BNI_iOS/swift-macosx-x86_64/stdlib/public/core/8/UnsafeBufferPointer.swift, line 886

I tried some workarounds, but it doesn't work. so I investigate this library, and found some suspicious. This code use replaceSubranges, but I thought this call could invalidate indices value and make the storage unstable

So my solution is make another buffer and copy characters in it. I can't assure this code can fix the crash Issue, but It will be the more safe way.

I thought you aren't maintaining this repository, but please review it. Thank you.

alexisakers commented 4 years ago

Hi, thanks for your help with this. Is this in any way related to #27? It seems that the underlying issue lies with the NSString implementation and that it might be a Swift bug.

If you could comment in that issue with any steps to reproduce it would be helpful for me to investigate and maybe submit a bug report.

I wanted to point out that the library previously was implemented with an output buffer the way that you are now proposing but I eventually found that this approach has a few bottlenecks and isn't as performant as the current approach.

JCSooHwanCho commented 4 years ago

@alexaubry Thank you for replying!

I tried the way in #27 but It doesn't work.

I haven't found the exact reproduce path, the only thing I know is that crashed strings include & character. If I found it later, I will let you know.

but I have a doubt about bottleneck you said. My code's time performance is better than the original one. here's the test result.

This is the original test result.(testLargeUnescapingPerformance) 스크린샷 2020-09-08 오전 11 32 13

And this is my code's test result. 스크린샷 2020-09-08 오전 11 33 10 (I showed only large test case, but small cases are also showing better result)

the original code already copy the original string, so space complexity is not changed. and the original string doesn't changed both in original code and my code, it looks thread-safe(I don't have exact knowledge about this. If it's wrong, please point it out)

alexisakers commented 4 years ago

My initial thought is that it might be less performant to copy a string this way since the current copy is made with copy on write and reserving the required space while doing so.

If you have a small test case that fails it would be really helpful for me to determine what's causing the issue.

JCSooHwanCho commented 4 years ago

I tried to find some failure cases with crash logs, but Any crashes are never reproduced.

I think this pull request's title should change. It gives some performance gain. but not confidence for solution of original problem.

The answer how it achieve performance improvement is in the implementation of replaceSubrange method. The replaceSubrange method divides string into 3 parts, replaces the middle part with new part provided as arguments, and joined again. But dividing and joining cost could be reduced by using appending parts. That's my solution.

alexisakers commented 4 years ago

Thanks again for your help, your detailed explanations and your patience too, as I'm dealing with a lot of things at the moment. I'm going to merge this now and create a new release after I go through the other PRs and issues.