Closed ilya-g closed 3 years ago
We haven't discussed the proposal in details yet, but here's my feedback.
I'm concerned a bit about use cases. While they all look valid, I cannot estimate how much these cases are in demand.
I've tried to search GitHub for the specific patterns listed in alternatives (Regex.findAll
and indexOf-while
loop), but haven't succeeded to find significant number of cases where the function would fit.
I've also consulted with Sergei Lebedev (@superbobry) who works as a Research Engineer on high-performance biological data analysis projects in JetBrains Biolabs, about the use case with finding DNA subsequences. It occurred that while it is a valid use case, the linear performance achieved by KMP algorithm isn't feasible to perform searches in extremely long sequences. Some indexing approaches like suffix trees are applied to a string being searched in, and not to a pattern string in this case.
Regarding to the alternatives, it's interesting to note that Regex also has a similar optimization for the cases where KMP algorithm shines — it uses Boyer-Moore search algorithm for long string literals. So, I believe it isn't accurate to state "too slow" as a consideration against the alternatives without performing some benchmarks first.
I'm concerned a bit about use cases.
Point taken on the bioinformatics use case, that might not have been the best example. I think there is demand though:
Non-Overlapping
Overlapping
Regex also has a similar optimization
I've run some benchmarks and indeed the performance of the Regex is identical to my proposed method using findAll
for non-overlapping matching. Unfortunately, there's no way to make a Regex match overlapping occurrences natively, which makes this benchmark print occ: 18 ms, regex: 24695 ms, indexOf: 9661 ms
. The longer the pattern becomes, the better performs occurrencesOf
in comparison with indexOf
and the Regex.
Note that the regex use cases you have given most of the time involve searching non-literal patterns, so they cannot be readily replaced with occurrencesOf
Right, then there are these for example:
I've looked through these SO questions, and still they look rather artificial to me, I can't imagine how it can be useful in real-life cases.
We had an initial discussion about this proposal today and found that while it can be potentially useful, the current use-cases aren't convincing enough to include it into the standard library.
Instead we'd like to establish some library with not-so-often-required but still useful text-related functions kotlinx.text
. This proposal is a good candidate for inclusion in that library.
Sounds like a good compromise to me, thanks for the update. Does the KEEP need an update to discuss its inclusion in kotlinx?
ilya-g notifications@github.com schrieb am Mo., 4. Juli 2016, 17:47:
I've looked through these SO questions, and still they look rather artificial to me, I can't imagine how it can be useful in real-life cases.
We had an initial discussion about this proposal today and found that while it can be potentially useful, the current use-cases aren't convincing enough to include it into the standard library. Instead we'd like to establish some library with not-so-often-required but still useful text-related functions kotlinx.text. This proposal is a good candidate for inclusion in that library.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Kotlin/KEEP/issues/20#issuecomment-230317194, or mute the thread https://github.com/notifications/unsubscribe/AAKtPRfICcZcmGH3Fr2VacfTN6IjBepUks5qSSrAgaJpZM4IryzX .
I would think this what find
or findAll
function does. Unfortunate find
already exists and means first
on iterables.
Instead we'd like to establish some library with not-so-often-required but still useful text-related functions kotlinx.text
I've opened the issue KT-45445 to collect ideas for such a library, and linked this proposal there. I'm going to close this proposal here because it's no longer about the standard library.
Discussions about the Extensions to find all occurrences of a pattern in a CharSequence will be held here.