Closed athoik closed 6 years ago
Hi
I am aware of the character issue. The DAB data contains an encoding of the right characterset, however, I am not aware of a decent character handling library in C++, if you have suggestions, I really would appreciate that
best jan
2017-12-26 11:32 GMT+01:00 Athanasios Oikonomou notifications@github.com:
Hi,
The following I/Q sample has some special characters in program names.
La 1ère BXL La 1ère Wallonie VivaCité
Test Musiq3 + (6358) is part of the ensemble BRF 2 (6367) is part of the ensemble BRF 1 (6366) is part of the ensemble La 1�re Wallonie (6351) is part of the ensemble TPEG_PACKET (data) (E0606361) is part of the ensemble Musiq3 (6353) is part of the ensemble La 1�re BXL (6951) is part of the ensemble VivaCit� (6052) is part of the ensemble Test Classic 21+ (6356) is part of the ensemble Classic 21 (6354) is part of the ensemble TARMAC (6357) is part of the ensemble Pure (6355) is part of the ensemble
The é and è are using extended ascii code 130 and 138.
Is there a way to detect way what encoding is used in program name using library or the program should handled it somehow?
Here is a RAW I/Q sample: 20171226_092958_12B.iq 39.1 MB https://mega.nz/#!eY8ykbjY!olCaQY_2x27Bva_8QLUAZODiMP0tNW2YzBtYnLwLMd8
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JvanKatwijk/dab-cmdline/issues/27, or mute the thread https://github.com/notifications/unsubscribe-auth/AITzwAimCTf-72BU2-ejsxO1K9sT_W_gks5tEMs5gaJpZM4RMrRG .
-- Jan van Katwijk
+31 (0)15 3698980 +31 (0) 628260355
I looked into it. The issue here is that the DAB specification talks about ebu latin 1 encoding. As you might have guessed, the characters that are not displayed correctly, have the 8 bit on (i.e a variant of ISO 8859) Setting the locale to ..8859 does not help, I can map all characters onto their utf8 equivalent, but it does not seem that the Linux environment processes these utf8 character right
So: looking into it: yes, solution found: not yet
2017-12-26 11:32 GMT+01:00 Athanasios Oikonomou notifications@github.com:
Hi,
The following I/Q sample has some special characters in program names.
La 1ère BXL La 1ère Wallonie VivaCité
Test Musiq3 + (6358) is part of the ensemble BRF 2 (6367) is part of the ensemble BRF 1 (6366) is part of the ensemble La 1�re Wallonie (6351) is part of the ensemble TPEG_PACKET (data) (E0606361) is part of the ensemble Musiq3 (6353) is part of the ensemble La 1�re BXL (6951) is part of the ensemble VivaCit� (6052) is part of the ensemble Test Classic 21+ (6356) is part of the ensemble Classic 21 (6354) is part of the ensemble TARMAC (6357) is part of the ensemble Pure (6355) is part of the ensemble
The é and è are using extended ascii code 130 and 138.
Is there a way to detect way what encoding is used in program name using library or the program should handled it somehow?
Here is a RAW I/Q sample: 20171226_092958_12B.iq 39.1 MB https://mega.nz/#!eY8ykbjY!olCaQY_2x27Bva_8QLUAZODiMP0tNW2YzBtYnLwLMd8
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JvanKatwijk/dab-cmdline/issues/27, or mute the thread https://github.com/notifications/unsubscribe-auth/AITzwAimCTf-72BU2-ejsxO1K9sT_W_gks5tEMs5gaJpZM4RMrRG .
-- Jan van Katwijk
+31 (0)15 3698980 +31 (0) 628260355
Hi,
Maybe we can use this code:
https://github.com/Opendigitalradio/ODR-PadEnc/blob/master/src/charset.h https://github.com/Opendigitalradio/ODR-PadEnc/blob/master/src/charset.cpp
Thanks!
Thanks
I'll look into it tomorrow,
jan
2017-12-28 21:55 GMT+01:00 Athanasios Oikonomou notifications@github.com:
Hi,
Maybe we can use this code:
https://github.com/Opendigitalradio/ODR-PadEnc/blob/master/src/charset.h https://github.com/Opendigitalradio/ODR-PadEnc/blob/master/src/charset.cpp
Thanks!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JvanKatwijk/dab-cmdline/issues/27#issuecomment-354356566, or mute the thread https://github.com/notifications/unsubscribe-auth/AITzwGVx4UWtmlvw8pxpqOgYu-eK1pVCks5tFABagaJpZM4RMrRG .
-- Jan van Katwijk
+31 (0)15 3698980 +31 (0) 628260355
These files are just in the wrong direction. The data encoded in DAB uses an 8859-1 encoding. It is fairly easy to translate the characters with bit 8 on to UTF-8, however, - although the box I am using should be able to handle UTF-8 - the problem stays.
I'll try this afternoon on an Ubuntu box.
best jan
2017-12-28 22:09 GMT+01:00 jan van katwijk j.vankatwijk@gmail.com:
Thanks
I'll look into it tomorrow,
jan
2017-12-28 21:55 GMT+01:00 Athanasios Oikonomou notifications@github.com :
Hi,
Maybe we can use this code:
https://github.com/Opendigitalradio/ODR-PadEnc/blob/master/src/charset.h https://github.com/Opendigitalradio/ODR-PadEnc/blob/master/ src/charset.cpp
Thanks!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JvanKatwijk/dab-cmdline/issues/27#issuecomment-354356566, or mute the thread https://github.com/notifications/unsubscribe-auth/AITzwGVx4UWtmlvw8pxpqOgYu-eK1pVCks5tFABagaJpZM4RMrRG .
-- Jan van Katwijk
+31 (0)15 3698980 <+31%2015%20369%208980> +31 (0) 628260355 <+31%206%2028260355>
-- Jan van Katwijk
+31 (0)15 3698980 +31 (0) 628260355
Hi
I changed the code in a way that all strings that are output are encoded in utf-8. However, I have a problem in setting the locale to something different than en-US.UTF-8 so I cannot verify the results
best j
2017-12-28 21:55 GMT+01:00 Athanasios Oikonomou notifications@github.com:
Hi,
Maybe we can use this code:
https://github.com/Opendigitalradio/ODR-PadEnc/blob/master/src/charset.h https://github.com/Opendigitalradio/ODR-PadEnc/blob/master/src/charset.cpp
Thanks!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JvanKatwijk/dab-cmdline/issues/27#issuecomment-354356566, or mute the thread https://github.com/notifications/unsubscribe-auth/AITzwGVx4UWtmlvw8pxpqOgYu-eK1pVCks5tFABagaJpZM4RMrRG .
-- Jan van Katwijk
+31 (0)15 3698980 +31 (0) 628260355
Hi,
I didn't get valid utf8 back, although with the following code everything seems ok here!
diff --git a/library/src/backend/charsets.cpp b/library/src/backend/charsets.cpp
index b9bbbb8..3135c46 100644
--- a/library/src/backend/charsets.cpp
+++ b/library/src/backend/charsets.cpp
@@ -69,6 +69,24 @@ static const unsigned short ebuLatinToUcs2[] = {
/* 0xf8 - 0xff */ 0xfe, 0x014b, 0x0155, 0x0107, 0x015b, 0x017a, 0x0167, 0xff
};
+static const char* utf8_encoded_EBU_Latin[] = {
+"\0", "Ę", "Į", "Ų", "Ă", "Ė", "Ď", "Ș", "Ț", "Ċ", "\n","\v","Ġ", "Ĺ", "Ż", "Ń",
+"ą", "ę", "į", "ų", "ă", "ė", "ď", "ș", "ț", "ċ", "Ň", "Ě", "ġ", "ĺ", "ż", "\u0082",
+" ", "!", "\"","#", "ł", "%", "&", "'", "(", ")", "*", "+", ",", "-", ".", "/",
+"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";", "<", "=", ">", "?",
+"@", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O",
+"P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "[", "Ů", "]", "Ł", "_",
+"Ą", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o",
+"p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "«", "ů", "»", "Ľ", "Ħ",
+"á", "à", "é", "è", "í", "ì", "ó", "ò", "ú", "ù", "Ñ", "Ç", "Ş", "ß", "¡", "Ÿ",
+"â", "ä", "ê", "ë", "î", "ï", "ô", "ö", "û", "ü", "ñ", "ç", "ş", "ğ", "ı", "ÿ",
+"Ķ", "Ņ", "©", "Ģ", "Ğ", "ě", "ň", "ő", "Ő", "€", "£", "$", "Ā", "Ē", "Ī", "Ū",
+"ķ", "ņ", "Ļ", "ģ", "ļ", "İ", "ń", "ű", "Ű", "¿", "ľ", "°", "ā", "ē", "ī", "ū",
+"Á", "À", "É", "È", "Í", "Ì", "Ó", "Ò", "Ú", "Ù", "Ř", "Č", "Š", "Ž", "Ð", "Ŀ",
+"Â", "Ä", "Ê", "Ë", "Î", "Ï", "Ô", "Ö", "Û", "Ü", "ř", "č", "š", "ž", "đ", "ŀ",
+"Ã", "Å", "Æ", "Œ", "ŷ", "Ý", "Õ", "Ø", "Þ", "Ŋ", "Ŕ", "Ć", "Ś", "Ź", "Ť", "ð",
+"ã", "å", "æ", "œ", "ŵ", "ý", "õ", "ø", "þ", "ŋ", "ŕ", "ć", "ś", "ź", "ť", "ħ"};
+
std::string toStringUsingCharset (const char* buffer,
CharacterSet charset, int size) {
std::string s;
@@ -91,11 +109,8 @@ uint16_t i;
case EbuLatin:
default:
for (i = 0; i < length; i++)
- if (buffer [i] & 0x80) {
- uint8_t c0 = (0xc0 | (((uint8_t)buffer [i]) >> 6));
- uint8_t c1 = ((buffer [i] & 0x3f) | 0x80);
- s. push_back (c0);
- s. push_back (c1);
+ if (buffer [i] & 0xff) {
+ s. append (utf8_encoded_EBU_Latin[buffer[i] & 0xff]);
}
else
s. push_back (buffer [i]);
$ dab-raw-3 -F 20171226_092958_12B.iq
dab_cmdline V 1.0alfa,
Copyright 2017 J van Katwijk, Lazy Chair Computing
opt = F
ofdm word gestart
Period = 8000
End of file, restarting
there might be a DAB signal here
no ensemble data found, fatal
BRF 1 (6366) is part of the ensemble
La 1ère Wallonie (6351) is part of the ensemble
TPEG_PACKET (data) (E0606361) is part of the ensemble
End of file, restarting
Classic 21 (6354) is part of the ensemble
ensemble RTBF DAB is (6005) recognized
Test Musiq3 + (6358) is part of the ensemble
BRF 2 (6367) is part of the ensemble
TARMAC (6357) is part of the ensemble
Pure (6355) is part of the ensemble
Musiq3 (6353) is part of the ensemble
VivaCité (6052) is part of the ensemble
Test Classic 21+ (6356) is part of the ensemble
La 1ère BXL (6951) is part of the ensemble
End of file, restarting
^C
Great. Thanks,
2018-01-03 15:47 GMT+01:00 Athanasios Oikonomou notifications@github.com:
Hi,
I didn't get valid utf8 back, although with the following code everything seems ok here!
diff --git a/library/src/backend/charsets.cpp b/library/src/backend/charsets.cpp index b9bbbb8..3135c46 100644 --- a/library/src/backend/charsets.cpp +++ b/library/src/backend/charsets.cpp @@ -69,6 +69,24 @@ static const unsigned short ebuLatinToUcs2[] = { / 0xf8 - 0xff / 0xfe, 0x014b, 0x0155, 0x0107, 0x015b, 0x017a, 0x0167, 0xff };
+static const char utf8_encoded_EBU_Latin[] = { +"\0", "Ę", "Į", "Ų", "Ă", "Ė", "Ď", "Ș", "Ț", "Ċ", "\n","\v","Ġ", "Ĺ", "Ż", "Ń", +"ą", "ę", "į", "ų", "ă", "ė", "ď", "ș", "ț", "ċ", "Ň", "Ě", "ġ", "ĺ", "ż", "\u0082", +" ", "!", "\"","#", "ł", "%", "&", "'", "(", ")", "", "+", ",", "-", ".", "/", +"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";", "<", "=", ">", "?", +"@", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", +"P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "[", "Ů", "]", "Ł", "_", +"Ą", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", +"p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "«", "ů", "»", "Ľ", "Ħ", +"á", "à", "é", "è", "í", "ì", "ó", "ò", "ú", "ù", "Ñ", "Ç", "Ş", "ß", "¡", "Ÿ", +"â", "ä", "ê", "ë", "î", "ï", "ô", "ö", "û", "ü", "ñ", "ç", "ş", "ğ", "ı", "ÿ", +"Ķ", "Ņ", "©", "Ģ", "Ğ", "ě", "ň", "ő", "Ő", "€", "£", "$", "Ā", "Ē", "Ī", "Ū", +"ķ", "ņ", "Ļ", "ģ", "ļ", "İ", "ń", "ű", "Ű", "¿", "ľ", "°", "ā", "ē", "ī", "ū", +"Á", "À", "É", "È", "Í", "Ì", "Ó", "Ò", "Ú", "Ù", "Ř", "Č", "Š", "Ž", "Ð", "Ŀ", +"Â", "Ä", "Ê", "Ë", "Î", "Ï", "Ô", "Ö", "Û", "Ü", "ř", "č", "š", "ž", "đ", "ŀ", +"Ã", "Å", "Æ", "Œ", "ŷ", "Ý", "Õ", "Ø", "Þ", "Ŋ", "Ŕ", "Ć", "Ś", "Ź", "Ť", "ð", +"ã", "å", "æ", "œ", "ŵ", "ý", "õ", "ø", "þ", "ŋ", "ŕ", "ć", "ś", "ź", "ť", "ħ"}; + std::string toStringUsingCharset (const char* buffer, CharacterSet charset, int size) { std::string s; @@ -91,11 +109,8 @@ uint16_t i; case EbuLatin: default: for (i = 0; i < length; i++)
- if (buffer [i] & 0x80) {
- uint8_t c0 = (0xc0 | (((uint8_t)buffer [i]) >> 6));
- uint8_t c1 = ((buffer [i] & 0x3f) | 0x80);
- s. push_back (c0);
- s. push_back (c1);
- if (buffer [i] & 0xff) {
- s. append (utf8_encoded_EBU_Latin[buffer[i] & 0xff]); } else s. push_back (buffer [i]);
$ dab-raw-3 -F 20171226_092958_12B.iq dab_cmdline V 1.0alfa, Copyright 2017 J van Katwijk, Lazy Chair Computing opt = F ofdm word gestart Period = 8000 End of file, restarting there might be a DAB signal here
no ensemble data found, fatal BRF 1 (6366) is part of the ensemble La 1ère Wallonie (6351) is part of the ensemble TPEG_PACKET (data) (E0606361) is part of the ensemble End of file, restarting Classic 21 (6354) is part of the ensemble ensemble RTBF DAB is (6005) recognized Test Musiq3 + (6358) is part of the ensemble BRF 2 (6367) is part of the ensemble TARMAC (6357) is part of the ensemble Pure (6355) is part of the ensemble Musiq3 (6353) is part of the ensemble VivaCité (6052) is part of the ensemble Test Classic 21+ (6356) is part of the ensemble La 1ère BXL (6951) is part of the ensemble End of file, restarting ^C
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JvanKatwijk/dab-cmdline/issues/27#issuecomment-355029352, or mute the thread https://github.com/notifications/unsubscribe-auth/AITzwDe0JTEZcXzhqkU0GZpRE8lmXY5Zks5tG5L-gaJpZM4RMrRG .
-- Jan van Katwijk
+31 (0)15 3698980 +31 (0) 628260355
Just a note, characters 1 to 127 will be appended directly to string. Although utf8_encoded_EBU_Latin doesn't match the "ascii/iso" ones.
eg "\x01" matches to "Ę" when using EBU_Latin. But "\x01" on ascii translates to ^A (SOH).
So the following still required if I am not mistaken.
diff --git a/library/src/backend/charsets.cpp b/library/src/backend/charsets.cpp
index dcf5221..f030357 100644
--- a/library/src/backend/charsets.cpp
+++ b/library/src/backend/charsets.cpp
@@ -110,11 +110,8 @@ uint16_t i;
case EbuLatin:
default:
for (i = 0; i < length; i++)
- if (buffer [i] & 0x80) {
- if (buffer [i] & 0xff) {
+ if (buffer [i] & 0xff)
s. append (utf8_encoded_EBU_Latin [buffer[i] & 0xff]);
- }
- }
else
s. push_back (buffer [i]);
}
The test was doen twice, the 8-bit test was done first, otherwise the char is added to the buffer directly I'll change it to if (buffer [i] & 0x8F) .... else s. push_back ...
2018-01-03 21:19 GMT+01:00 Athanasios Oikonomou notifications@github.com:
Just a note, characters 1 to 127 will be appended directly to string. Although utf8_encoded_EBU_Latin doesn't match the "ascii/iso" ones.
eg "\x01" matches to "Ę" when using EBU_Latin. But "\x01" on ascii translates to ^A (SOH).
So the following still required if I am not mistaken.
diff --git a/library/src/backend/charsets.cpp b/library/src/backend/charsets.cpp index dcf5221..f030357 100644 --- a/library/src/backend/charsets.cpp +++ b/library/src/backend/charsets.cpp @@ -110,11 +110,8 @@ uint16_t i; case EbuLatin: default: for (i = 0; i < length; i++)
- if (buffer [i] & 0x80) {
- if (buffer [i] & 0xff) {
- if (buffer [i] & 0xff) s. append (utf8_encoded_EBU_Latin [buffer[i] & 0xff]);
- }
- } else s. push_back (buffer [i]); }
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JvanKatwijk/dab-cmdline/issues/27#issuecomment-355116064, or mute the thread https://github.com/notifications/unsubscribe-auth/AITzwIlD7OJfgG_Z2L-saqpIsJsPDJnVks5tG-DSgaJpZM4RMrRG .
-- Jan van Katwijk
+31 (0)15 3698980 +31 (0) 628260355
I am sorry once again, but this will work only for positions >= 128.
What about "Ę", (0x01), "Į", (0x02) etc?
It seems that EBU Latin character defines those positions, differently than normal ascii control chars.
Hi
As far as I know, ebu latin1 is asci in its first 127 positions, then the special characters? jan
2018-01-04 12:54 GMT+01:00 Athanasios Oikonomou notifications@github.com:
I am sorry once again, but this will work only for positions >= 128.
What about "Ę", (0x01), "Į", (0x02) etc?
It seems that EBU Latin character defines those positions, differently than normal ascii control chars.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JvanKatwijk/dab-cmdline/issues/27#issuecomment-355264409, or mute the thread https://github.com/notifications/unsubscribe-auth/AITzwOcRqYkQG6ozQpmOhZCQIJg6c1C-ks5tHLwSgaJpZM4RMrRG .
-- Jan van Katwijk
+31 (0)15 3698980 +31 (0) 628260355
Then this table is wrong? https://github.com/Opendigitalradio/ODR-PadEnc/blob/master/src/charset.cpp#L38
It seems ok according to: ETSI TS 101 756 v1.8.1. (page 41)
https://worlddabeureka.org/2015/08/03/issue-26-new-latin-based-character-set-for-dab/
http://www.etsi.org/deliver/etsi_ts/101700_101799/101756/01.08.01_60/ts_101756v010801p.pdf
well according to ETSI TS 101 756 the table is correct apart from the first two rows that are not specified in 101 756. The characters from 040 .. 177 are the asci set
2018-01-04 14:54 GMT+01:00 Athanasios Oikonomou notifications@github.com:
Then this table is wrong? https://github.com/Opendigitalradio/ODR-PadEnc/ blob/master/src/charset.cpp#L38
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JvanKatwijk/dab-cmdline/issues/27#issuecomment-355287588, or mute the thread https://github.com/notifications/unsubscribe-auth/AITzwIXoZAYYiRX5Se6-ejeF-7ujKM5Wks5tHNgNgaJpZM4RMrRG .
-- Jan van Katwijk
+31 (0)15 3698980 +31 (0) 628260355
I think we should handle EBU Latin separately from ISO Latin and add utf16to8 (eg from utfcpp).
0000 Complete EBU Latin based repertoire - see annex C
0100 ISO Latin Alphabet No. 1 (see ISO/IEC 8859-1 [8])
0110 ISO/IEC 10646 [26] using UCS-2 transformation format, big endian byte order
1111 ISO/IEC 10646 [26] using UTF-8 transformation format
Most probably today most people still use Latin1 :)
Well, 8859-1 and ebu share the ASCI subset. The charsets numbers should indicate the charsets used, I never saw anything other than 0. But if you have a suggestion?
best jan
2018-01-04 15:43 GMT+01:00 Athanasios Oikonomou notifications@github.com:
I think we should handle EBU Latin separately from ISO Latin and add utf16to8 (eg from utfcpp).
0000 Complete EBU Latin based repertoire - see annex C 0100 ISO Latin Alphabet No. 1 (see ISO/IEC 8859-1 [8]) 0110 ISO/IEC 10646 [26] using UCS-2 transformation format, big endian byte order 1111 ISO/IEC 10646 [26] using UTF-8 transformation format
Most probably today most people still use Latin1 :)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JvanKatwijk/dab-cmdline/issues/27#issuecomment-355299382, or mute the thread https://github.com/notifications/unsubscribe-auth/AITzwPF1lFcBIL9d6cqc2jFpvYURSDE0ks5tHOOSgaJpZM4RMrRG .
-- Jan van Katwijk
+31 (0)15 3698980 +31 (0) 628260355
Hi,
I think the following will be fine, until somebody uses UCS2 encoding.
diff --git a/library/includes/backend/charsets.h b/library/includes/backend/charsets.h
index 4851443..399b481 100644
--- a/library/includes/backend/charsets.h
+++ b/library/includes/backend/charsets.h
@@ -33,8 +33,9 @@
*/
typedef enum {
EbuLatin = 0x00, // Complete EBU Latin based repertoire - see annex C
- UnicodeUcs2 = 0x06,
- UnicodeUtf8 = 0x0F
+ IsoLatin = 0x04, // ISO Latin Alphabet No. 1 (see ISO/IEC 8859-1 [8])
+ UnicodeUcs2 = 0x06, // ISO/IEC 10646 [26] using UCS-2 transformation format, big endian byte order
+ UnicodeUtf8 = 0x0F // ISO/IEC 10646 [26] using UTF-8 transformation format
} CharacterSet;
/**
diff --git a/library/src/backend/charsets.cpp b/library/src/backend/charsets.cpp
index cd8d6db..202421e 100644
--- a/library/src/backend/charsets.cpp
+++ b/library/src/backend/charsets.cpp
@@ -100,21 +100,20 @@ uint16_t i;
length = size;
switch (charset) {
-// case UnicodeUcs2:
-// s = std::string::fromUtf16 ((const ushort*) buffer, length);
-// break;
+ case EbuLatin:
+ for (i = 0; i < length; i++)
+ s. append (utf8_encoded_EBU_Latin [buffer[i] & 0xff]);
+ break;
- case UnicodeUtf8:
- break;
+ case UnicodeUcs2:
+ throw std::logic_error("UnicodeUcs2 to Utf8 not yet implemented")
+ break;
- case EbuLatin:
+ case IsoLatin:
+ case UnicodeUtf8:
default:
- for (i = 0; i < length; i++)
- if (buffer [i] & 0x80) { // extended char
- s. append (utf8_encoded_EBU_Latin [buffer[i] & 0xff]);
- }
- else
- s. push_back (buffer [i]);
+ for (i = 0; i < length; i++)
+ s. push_back (buffer [i]);
}
return s;
Sounds a pragmatic approach,
2018-01-04 20:58 GMT+01:00 Athanasios Oikonomou notifications@github.com:
Hi,
I think the following will be fine, until somebody uses UCS2 encoding.
diff --git a/library/includes/backend/charsets.h b/library/includes/backend/charsets.h index 4851443..399b481 100644 --- a/library/includes/backend/charsets.h +++ b/library/includes/backend/charsets.h @@ -33,8 +33,9 @@ */ typedef enum { EbuLatin = 0x00, // Complete EBU Latin based repertoire - see annex C
UnicodeUcs2 = 0x06,
UnicodeUtf8 = 0x0F
IsoLatin = 0x04, // ISO Latin Alphabet No. 1 (see ISO/IEC 8859-1 [8])
UnicodeUcs2 = 0x06, // ISO/IEC 10646 [26] using UCS-2 transformation format, big endian byte order
UnicodeUtf8 = 0x0F // ISO/IEC 10646 [26] using UTF-8 transformation format } CharacterSet;
/** diff --git a/library/src/backend/charsets.cpp b/library/src/backend/charsets.cpp index cd8d6db..202421e 100644 --- a/library/src/backend/charsets.cpp +++ b/library/src/backend/charsets.cpp @@ -100,21 +100,20 @@ uint16_t i; length = size;
switch (charset) {
-// case UnicodeUcs2: -// s = std::string::fromUtf16 ((const ushort*) buffer, length); -// break;
case EbuLatin:
for (i = 0; i < length; i++)
s. append (utf8_encoded_EBU_Latin [buffer[i] & 0xff]);
break;
case UnicodeUtf8:
break;
case UnicodeUcs2:
throw std::logic_error("UnicodeUcs2 to Utf8 not yet implemented")
break;
case EbuLatin:
case IsoLatin1:
case UnicodeUtf8: default:
for (i = 0; i < length; i++)
if (buffer [i] & 0x80) { // extended char
s. append (utf8_encoded_EBU_Latin [buffer[i] & 0xff]);
}
else
s. push_back (buffer [i]);
for (i = 0; i < length; i++)
s. push_back (buffer [i]); }
return s;
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JvanKatwijk/dab-cmdline/issues/27#issuecomment-355383198, or mute the thread https://github.com/notifications/unsubscribe-auth/AITzwJ7svHj7Op6KhZSMtQ7hlh9uauBlks5tHS19gaJpZM4RMrRG .
-- Jan van Katwijk
+31 (0)15 3698980 +31 (0) 628260355
Great!
I create a PR to solve few typos after latest merge.
Thanks!
2018-01-05 17:33 GMT+01:00 Athanasios Oikonomou notifications@github.com:
Great!
I create a PR to solve few typos after latest merge.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JvanKatwijk/dab-cmdline/issues/27#issuecomment-355600140, or mute the thread https://github.com/notifications/unsubscribe-auth/AITzwF9WVTXaix8Y8DPW6jl20aqx8gp7ks5tHk73gaJpZM4RMrRG .
-- Jan van Katwijk
+31 (0)15 3698980 +31 (0) 628260355
I guess we are done here, in case a broadcast with UCS-2 appeared we need a UCS2 to UTF8 function ;)
Hi,
The following I/Q sample has some special characters in program names.
La 1ère BXL La 1ère Wallonie VivaCité
The é and è are using extended ascii code 130 and 138.
Is there a way to detect way what encoding is used in program name using library or the program should handled it somehow?
Here is a RAW I/Q sample: 20171226_092958_12B.iq 39.1 MB