golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.04k stars 17.54k forks source link

proposal: strings: function to check if a string contains all specified runes #63328

Closed knbr13 closed 8 months ago

knbr13 commented 11 months ago

Use Case

Sometimes I encounter scenarios where I need to check if a given string contains all of a specific set of runes, such as vowels ('a', 'e', 'i', 'o', 'u'). Currently, I have to iterate through the string and runes manually.

Proposal I propose adding a function to the strings package that can efficiently check if a string contains all of the specified runes. For example:

func containsAllRunes(s string, runes []rune) bool {
    for _, r := range runes {
        if ContainsRune(s, r) {
            return false
        }
    }
    return true
}

- Note that I didn't prefixed ContainsRune function with 'strings' which is the name of the package, because this function is intended to be in the strings package

- The implementation provided above is a basic example for illustration purposes.The actual implementation can be optimized for better performance and efficiency.

This function would greatly simplify such checks and improve code readability. usage example:

func main() {
    s := "hello world from Go!"
    b := containsAllRunes(s, []rune("abcde"))
    fmt.Println("the string contains all the provided runes:", b)
}
seankhliao commented 11 months ago

how common is this that it should be in the standard library? can you point to some existing code that would be simpler if this function were available?

odeke-em commented 11 months ago

Thank you for filing this issue @abdullah-alaadine and welcome to the Go project. Thanks @seankhliao for the reply!

@abdullah-alaadine if I may, I believe that for correctness you wanted to use an inversion of the conditions so if !strings.ContainsRune(s, r) and not if strings.ContainsRune(s, r) so as to be

package main

import (
    "strings"
)

func containsAll(s string, wants []rune) bool {
    for _, r := range wants {
        if !strings.ContainsRune(s, r) {
            return false
        }
    }
    return true
}

func main() {
    println(containsAll("hello world from Go!", []rune("eld")))
}

but purely academically, that code above is expensive in the worst case which could become a quadratic time complexity lookup O(l*k) where l = len(s), k = len(wants) though in practice it is very fast and doesn't need a change in the Go standard library as that's most likely what everyone will default to for small strings

Improved implementation

For large strings or larger slices of want, you could use strings.ContainsFunc which got introduced in Go1.20 to iterate exactly once over wants and store the values in a map that'll be used later on to check if a value exists in s so giving a linear time lookup of O(n) where n = max(l, k) per https://go.dev/play/p/ZyN8JEFhLzJ or

package main

import (
    "strings"
)

func containsAll(s string, wants []rune) bool {
    m := make(map[rune]bool, len(wants))
    // iterates over all the runes in wants exactly once: O(l)
    for _, r := range wants {
        m[r] = false
    }

    // Iterates over all the string's runes exactly once: O(k)
    strings.ContainsFunc(s, func(r rune) bool {
        if _, inWants := m[r]; inWants {
            // Constants time lookups & insertions: O(1)
            m[r] = true
        }
        return false // Return false to iterate over all runes in s.
    })

    for _, wasSeen := range m {
        if !wasSeen {
            return false
        }
    }
    return true
}

func main() {
    println(containsAll("hello world from Go!", []rune("elfd")))
    println(containsAll("hello world from Go!", []rune("elfdz")))
}

Academically my above suggestion seems cheaper but in practice for small slices, iterating over 2 slices is very fast and cheaper

Proposal motivations and demand for this code

Now given that a more efficient implementation had to be scrutinized by myself and it requires more thought for larger values, perhaps that's good motivation for the standard library to take the logic away from everyone having to implement it, however we need data how much code out there has to look for all runes, how often is this pattern required, can you please compile usages of the need for this code? @rsc got a Go corpus compiled and also Sourcegraph allows you to search over public Go repositories https://sourcegraph.com/search /cc @sqs @beyang

knbr13 commented 11 months ago

mm, yes I think that the real-world use cases for this function are relatively rare, for me, I needed this helper function in this LeetCode problem "316. Remove Duplicate Letters", where I want to check if a substring of the whole string contains a set of specific characters, but I can solve the problem in a simpler and more efficient way, and I don't have specific public examples to point to, but I've thought of some use cases that this helper function is useful, like if I am building a game, where users need to press specific keys a certain number of times, and the order of the input doesn't matter (which it's not usually case in games). I think that the real-world use cases for such helper are actually rare, but I'll think more about it to see if there's a real need for it.

Anyway, I'll add the updated implementation of this function to make it work as expected. (The provided one above is just to explain the purpose of it, it won't work as expected in some cases).

function version 2:

func containsAll(s string, wants []rune) bool {
    m := map[rune]int{}
    for _, r := range s {
        if _, ok := m[r]; !ok {
            m[r] = 1
        } else {
            m[r]++
        }
    }
    for _, r := range wants {
        if _, ok := m[r]; !ok {
            return false
        } else if m[r] == 0 {
            return false
        }
        m[r]--
    }
    return true
}

The problem with this implementation is that if the developer need to check if x number of a specific rune is in the string, like str: 'hello world from Go!' runes: the string must contain 5 'a', 3 'b' and 1 'c', then the developer have to write them manually: res := containsAll("hello world from Go!", []runes("aaaaabbbc") this will become a problem as the number of occurrences needed increases, this problem will be addressed in function version 3.

function version 3:

func containsAll(s string, wants map[rune]int) bool {
    m := map[rune]int{}
    for _, r := range s {
        if _, ok := m[r]; !ok {
            m[r] = 1
        } else {
            m[r]++
        }
    }
    for k, v := range wants {
        if _, ok := m[k]; !ok && v > 0 {
            return false
        } else if m[k] < v {
            return false
        }
        m[k] -= v
    }
    return true
}

example on using it:

func main() {
    str := "hello world mr Go!"
    res := containsAll(str, map[rune]int{
        'r': 2,
        'o': 3,
        'h': 1,
        '!': 1,
    })
    fmt.Println("contains all runes:", res)
}
knbr13 commented 11 months ago

Thank you, @odeke-em, for your notes. Yes, you are right; I missed the '!' unintentionally. I'll search to see if there are a decent number of use cases for this helper function to determine whether it's worth adding to the standard library or not.

odeke-em commented 11 months ago

Gotcha and thank you for giving me the context for your need @abdullah-alaadine! I just looked at your problem and I believe you can definitely solve it in much simpler, more efficient and faster ways without having to use a map, you can just use a slice of 26 booleans values as an index, given the constraints:

so https://go.dev/play/p/ecMWraalU5J

package main

import "fmt"

func dedupLettersAndLexicographicOrder(s string) string {
    // Constraint, lower case English letters in s.
    // Output: Deduplicated letters in lexicographic order.
    index := make([]bool, 26)
    for _, r := range s {
        if i := r - 'a'; index[i] == false {
            index[i] = true
        }
    }

    // Finally return the result:
    res := make([]rune, 0, len(index))
    for i, ok := range index {
        if ok {
            res = append(res, rune(i+'a'))
        }
    }

    return string(res)
}

func main() {
    rubric := map[string]string{
        "aaaaaaabdeeeecfff": "abcdef",
        "bbbefffghaaaaaaa":  "abefgh",
    }

    for input, want := range rubric {
        got := dedupLettersAndLexicographicOrder(input)
        if got != want {
            fmt.Printf("%q =>\n\tGot:  %q\n\tWant: %q", input, got, want)
        }
    }
}

I think that once you get good data on this then the proposal can proceed.

knbr13 commented 11 months ago

Thanks for sharing your solution. @odeke-em It's indeed simpler and more efficient. But actually your solution will not pass the test cases on LeetCode, because there's one more constraint you missed (the problem description is bad), which is that you are not allowed to rearrange the characters in the string, I'll copy and paste the examples given in the problem descrition:

Example 1:

Input: s = "bcabc"
Output: "abc"

Example 2:

Input: s = "cbacdcbc"
Output: "acdb"

Example 1 is clear, but in example 2, as you can see, the output is acdb not abcd. In ex1 there are 3 possible strings: bca, cab, and abc. but in ex2 there is no abcd string without rearrangement.

knbr13 commented 11 months ago

after spending sometime on searching for real-world use cases for such function, where the developer has to use this function, I didn't find any real-world scenario for it, so I think it's better not adding it to the standard library. I'd like to hear back from you about your take on it, and thanks!

adonovan commented 8 months ago

after spending sometime on searching for real-world use cases for such function, where the developer has to use this function, I didn't find any real-world scenario for it, so I think it's better not adding it to the standard library. I'd like to hear back from you about your take on it, and thanks!

I think you're probably right; I don't think I've ever needed this function except when solving word puzzles.