PrinsFrank / standards

A collection of standards as PHP Enums: ISO3166, ISO4217, ISO639...
MIT License
393 stars 10 forks source link

Fiddle with uniqueness in EnumCase #186

Closed szepeviktor closed 8 months ago

szepeviktor commented 8 months ago

From https://github.com/PrinsFrank/standards/pull/183#issuecomment-1918910818

codecov[bot] commented 8 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (358b2bd) 100.00% compared to head (cf6dbf6) 100.00%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #186 +/- ## =========================================== Coverage 100.00% 100.00% Complexity 231 231 =========================================== Files 33 33 Lines 3611 3611 =========================================== Hits 3611 3611 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

szepeviktor commented 8 months ago

@PrinsFrank This seems impossible to solve!

     case eu = 'eu';
+    case eu = 'ею';
+    case eu = 'ευ';

What to do?? eu2, eu3 ... ?

image

🇮🇳

PrinsFrank commented 8 months ago

@szepeviktor what about detecting what script the initial text was in and appending that to the value? something like eu_greek, org_cyrillic, and if the script was latin don't append anything? I don't know how to detect scripts with PHP yet though.

szepeviktor commented 8 months ago

detecting what script the initial text was

The only thing I can think of is looping through all ScriptCode-s and try matching ^\p{sc=Hira}+$ https://www.unicode.org/reports/tr18/#Script_Property

PrinsFrank commented 8 months ago

@szepeviktor That's quite smart actually! If that is reliable it would be cool to add that the the transliteration package!

szepeviktor commented 8 months ago

all non-Latin CountryCodeTLD cases

alardn seems to be a Arabic
albhryn seems to be a Arabic
aljzayr seems to be a Arabic
almghrb seems to be a Arabic
alswdyt seems to be a Arabic
amarat seems to be a Arabic
ao_men seems to be a Han
art seems to be a Arabic
ayran seems to be a Arabic
banla seems to be a Bengali
bart seems to be a Arabic
bel seems to be a Cyrillic
bharat seems to be a Telugu
bharata seems to be a Kannada
bharatam seems to be a Devanagari
bharota seems to be a Devanagari
cinkappur seems to be a Tamil
el seems to be a Greek
flstyn seems to be a Arabic
hangug seems to be a Hangul
hay seems to be a Armenian
ilankai seems to be a Tamil
intiya seems to be a Tamil
kaz seems to be a Cyrillic
laav seems to be a Lao
lanka seems to be a Sinhala
man seems to be a Arabic
mkd seems to be a Cyrillic
mlysya seems to be a Arabic
mon seems to be a Cyrillic
msr seems to be a Arabic
mwrytanya seems to be a Arabic
pakstan seems to be a Arabic
qtr seems to be a Arabic
raq seems to be a Arabic
rf seems to be a Cyrillic
srb seems to be a Cyrillic
swdan seems to be a Arabic
swryt seems to be a Arabic
tai_wan seems to be a Han
thiy seems to be a Thai
twns seems to be a Arabic
ukr seems to be a Cyrillic
xiang_gang seems to be a Han
xin_jia_po seems to be a Han
ysr_l seems to be a Hebrew
zhong_guo seems to be a Han
szepeviktor commented 8 months ago

PHP supports these scripts: https://www.php.net/manual/en/regexp.reference.unicode.php not all our ScriptCode-s

    foreach ($scripts as $script) {
        if (preg_match('/^\p{' . $script . '}+$/u', $string) === 1) {
            return $script;
        }
    }
    return '?';

Please write the actual code. I don't know the where and the how.

szepeviktor commented 8 months ago

Now we have

the infrastructure 🏭

for PHP-supported scripts!

szepeviktor commented 8 months ago

I'm searching for the where but EnumCase does not know about other cases and EnumFile cannot change the name in EnumCase (because it is readonly).

szepeviktor commented 8 months ago

I think we should never drop a duplicate Enum case but throw an exception. I hope there is no data source that contains duplicate rows!

Let's see whether there are TWO eu_greek-s.

PrinsFrank commented 8 months ago

fixed in #200. Thanks!