Support for GSM national sets (ES/PT/TR)

dankogai / p5-encode

Encode - character encodings (for Perl 5.8 or better)

https://metacpan.org/release/Encode

37 stars 51 forks source link

Support for GSM national sets (ES/PT/TR) #150

Closed happy-barney closed 2 years ago

happy-barney commented 4 years ago

PR #149 reminded me work I started few years ago.

happy-barney commented 4 years ago

This PR doesn't contain documentation (yet). Consider it as a POC how $subj can be implemented.

dankogai commented 4 years ago

+1 for separate CPAN module under Encode:: namespace.

happy-barney commented 4 years ago

Module argument works for GSM charset as well.

I do not intend to do anything else apart sharing source. It's up to you to decide what to do with it (ie, give perl some competitive advantage ...)

happy-barney commented 4 years ago

and about usage of these tables ... depends on country. Obviously it is not very likely to receive Hindu message in Europe ...

For example, one project in Germany I participated on received around 2% of messages with Turkish language set.

pali commented 4 years ago

I'm surprised that some devices are still generating SMSs in national sets instead of universal UNICODE/UCS-2.

happy-barney commented 4 years ago

@pali don't be. SMS can contain 160 character / 140 bytes. Choosing national sets consumes 1 + (3 per base) + (3 per shift) bytes leaving space for 155 (resp 152) characters whereas UCS-2 is strictly 16 bits per char = 70 chars.

You can send longer messages but that takes another continuation UDH (4 bytes).

As a result, UCS-2 message is twice as expensive as GSM charset message.

pali commented 4 years ago

I know, I have read and have implemented TS 123 038 over TS 123 040 over ES 201 912 over V.23 over RFC3261.

Just I have not seen mobile devices which generares SMSs in National sets anymore...

Note that in National sets there are also characters behind escape sequence and for their usage you need to use 2 bytes (like in UCS-2). But usage of characaters in primary (non-escape part) is really decrease size of SMS.

happy-barney commented 4 years ago

Most likely observer bias due fact you are not living in country where supported language is used (eg: Turkish or Hindi)