PiRSquared17 / editra

Automatically exported from code.google.com/p/editra
Other
0 stars 0 forks source link

Encoding issues with German code files #735

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Work with codefiles containing German umlauts (äöüÄÖÜß) in Textmate
2. Open the file with Editra
3. Sometimes - but not always - the file cannot be opened, except when choosing 
the "mac-latin2" character set, though this will garble the umlauts

What is the expected output? What do you see instead?
> I expect Editra to be able to open German text files and to maintain the 
special characters.
> Instead I have to revert to Textmate to edit the problematic files.

IMPORTANT!!! Please answer these questions for any and ALL bug reports

What version of the product are you using? On what operating system?
> Editra 0.7.01 on Mac Os

What method of install was your version installed with (Binary/Source)?
> DMG via auto updater

Please provide any additional information below.
> See screenshot for the dialog. I *think* Editra lacks some required character 
set support. 
> Let me know if you need a copy of propblematic code files (since those are 
proprietary I feel reluctant to upload them publically)

Original issue reported on code.google.com by johannes...@googlemail.com on 27 Apr 2012 at 1:13

Attachments:

GoogleCodeExporter commented 9 years ago
Hi,

What does the preffered encoding say in the Editra preferences dialog? 
(General->Files)

Do you know what encoding you saved the files with when using TextMate? I would 
venture a guess of either Latin1 or UTF8.

If you can make a sample file that reproduces this issue it would help with 
testing. Shouldn't need to use your 'proprietary' ones. If you look in the 
Editra log while opening the file it should help in identifying what part of 
the file it is having trouble decoding, for making the sample.

Original comment by CodyPrec...@gmail.com on 27 Apr 2012 at 1:25

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
The default encoding is utf8 in Editra & Textmate.
Where can I find the Editra Log on Mac Os / Whom can I send a file?

Original comment by johannes...@googlemail.com on 27 Apr 2012 at 1:36

GoogleCodeExporter commented 9 years ago
Editra log - Either open the log window and copy the text (View->Shelf->Editra 
Log) or find the log file in your systems temp directory.

Just attach both here, as mentioned there shouldn't be any need to send any 
sort of private data to reproduce this issue.

Original comment by CodyPrec...@gmail.com on 27 Apr 2012 at 1:42

GoogleCodeExporter commented 9 years ago
Editra log:

[15:51:47][ed_txt][info] CheckBom called
[15:51:47][ed_txt][info] DetectEncoding - Check magic comment
[15:51:47][ed_txt][info] CheckMagicComment: ['<?\n', '// \xdf (szlig)\n']
[15:51:47][ed_txt][info] MagicComment is None
[15:51:47][ed_txt][info] Doing brute force encoding check
[15:51:47][ed_txt][info] DetectEncoding - Set Encoding to us-ascii
[15:51:47][ed_txt][info] Resetting buffer
[15:51:47][ed_txt][info] Read - Start reading
[15:51:47][ed_txt][info] Read - End reading
[15:51:47][ed_txt][info] Attempting to decode with: us-ascii
[15:51:47][ed_txt][err] Error while reading with us-ascii
[15:51:47][ed_txt][err] 'ascii' codec can't decode byte 0xdf in position 6: 
ordinal not in range(128)
[15:51:47][ed_txt][info] HandleRawBytes called
[15:51:47][ed_txt][info] DecodeText - raw - set encoding to binary
[15:51:47][ed_txt][info] Resetting buffer

Also see the attached file. 
Actually this file *does* contain broken characters, though they are added 
inside a /* comment */ to show some issues with the (stripped) code contained. 
Let me know if you need another file to check.

Original comment by johannes...@googlemail.com on 27 Apr 2012 at 1:54

Attachments:

GoogleCodeExporter commented 9 years ago
Odd, for some reason your system is falling back to a us-ascii encoding which 
will obviously fail to open this file since it is only a 7 bit encoding.

The attached sample file opens fine on my system without any issue (defaulting 
to cp1252). Inspecting the raw bytes in the file it does not appear to have 
been saved with UTF-8 encoding as the '\xdf' byte for the 'ß' character cannot 
be decoded with utf-8. While latin encodings such as cp1252 and latin1 are able 
to decode it.

As a test could you do the following:
TEST 1:
1) Enable the PyShell plugin in the Plugin manager (Tools->plugin Manager)
2) Restart Editra
3) Open PyShell (View->Shelf->PyShell)
4) Type the following into the pyshell window (return after each line)
import ed_txt
print ed_txt.GetEncodings()
5) Paste the output of the print statement here

Test 2:
1) Open your file that has a problem in TextMate (or anywhere else it opens 
correctly)
2) Add a comment line to the top of the file as such
<!-- encoding: cp1252 -->
3) save the file
4) Now try to open it in Editra

Original comment by CodyPrec...@gmail.com on 27 Apr 2012 at 2:33

GoogleCodeExporter commented 9 years ago
TEST1
>>> import ed_txt
>>> print ed_txt.GetEncodings()
[u'utf_8', 'us-ascii', 'utf-8', 'iso8859-1', 'utf8', 'utf-16', 'latin-1']

TEST2
I cannot open the file in Editra:

[10:34:18][ed_txt][info] Attempting to decode with: utf_8
[10:34:18][ed_txt][err] Error while reading with utf_8
[10:34:18][ed_txt][err] 'utf8' codec can't decode byte 0xa7 in position 5191: 
invalid start byte

I then open it in Textmate and add your comment. Reopening it in Editra it 
works:
[10:34:57][ed_txt][info] CheckBom called
[10:34:57][ed_txt][info] DetectEncoding - Check magic comment
[10:34:57][ed_txt][info] CheckMagicComment: ['<!-- encoding: cp1252 -->\n', 
'\n']
[10:34:57][ed_txt][info] MagicComment is cp1252
[10:34:57][ed_txt][info] DetectEncoding - Set Encoding to cp1252

Does this help?

Original comment by johannes...@googlemail.com on 30 Apr 2012 at 8:36

GoogleCodeExporter commented 9 years ago
Yes, thanks I think I can see what is happening now.

Will update when a fix has been committed.

Original comment by CodyPrec...@gmail.com on 30 Apr 2012 at 1:35

GoogleCodeExporter commented 9 years ago
Tentative fix committed for next release

Original comment by CodyPrec...@gmail.com on 2 May 2012 at 1:46

GoogleCodeExporter commented 9 years ago
Issue 736 has been merged into this issue.

Original comment by CodyPrec...@gmail.com on 2 May 2012 at 1:46

GoogleCodeExporter commented 9 years ago
Closing as fixed - new release will be made by this weekend

Original comment by CodyPrec...@gmail.com on 5 Jul 2012 at 2:21

GoogleCodeExporter commented 9 years ago
Issue 750 has been merged into this issue.

Original comment by CodyPrec...@gmail.com on 6 Jul 2012 at 1:57