go-playground / validator

:100:Go Struct and Field validation, including Cross Field, Cross Struct, Map, Slice and Array diving
MIT License
16.64k stars 1.32k forks source link

bcp47_language_tag doesn't fail on some non-BCP47 tags #1221

Open bfabio opened 7 months ago

bfabio commented 7 months ago

Package version eg. v9, v10:

v10

Issue, Question or Enhancement:

When using bcp47_language_tag for validation, some non-BCP47 tags such as "eng" or "en_US" are passing as valid.

‎isBCP47LanguageTag() uses golang.org/x/text/language's Parse and its documentation says:

[snip] It accepts tags in the BCP 47 format and extensions to this standard defined in https://www.unicode.org/reports/tr35/#Unicode_Language_and_Locale_Identifiers.

Code sample, to showcase or reproduce:

I expect both of these to fail, but they don't:

package main

import (
    "fmt"
    "github.com/go-playground/validator/v10"
)

func main() {
    validate := validator.New()

    err := validate.Var("en_US", "bcp47_language_tag")
    if err != nil {
        fmt.Println(err.Error())
        return
    }

    err = validate.Var("eng", "bcp47_language_tag")
    if err != nil {
        fmt.Println(err.Error())
        return
    }
}
shihanng commented 7 months ago

I think golang.org/x/text/language's Parse is based on Unicode Locale Data Markup Language (LDML)'s Unicode Language and Locale Identifiers which is based on BCP47 (but they are not strictly the same). E.g., Unicode Language and Locale Identifiers allow the underscore _ to be used as a separator.

sep = [-_] ;

But not BCP47:

 langtag       = language
                 ["-" script]
                 ["-" region]
                 *("-" variant)
                 *("-" extension)
                 ["-" privateuse]

There is a section called BCP 47 Conformance which reads:

It allows certain syntax for backwards compatibility (not BCP 47-compatible):

  • The "_" character for field separator characters, as well as the "-" used in