TeXworks / texworks

Main codebase for TeXworks, a simple interface for working with TeX documents
https://tug.org/texworks/
GNU General Public License v2.0
698 stars 127 forks source link

UTF-16 encoding not detected [was: Showing Chinese Characters instead of Telugu characters] #130

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. I opened my .tex file with Telugu characters (previously created using 
Notepad)
2. Excepe some commands like \documentclass etc. every thing is converting 
to chinese characters.
3. Evem the \end{document}

What is the expected output? What do you see instead?

What version of the product are you using? On what operating system?

MS Windows Vista and TeXworks r.237

Please provide any additional information below.

Original issue reported on code.google.com by dvg...@gmail.com on 12 May 2009 at 5:24

GoogleCodeExporter commented 9 years ago
What encoding was the .tex file saved with? TeXworks should interpret the text 
as
UTF-8 by default, unless you change its preferences.

Could you attach such a Telugu .tex file here, so we can see what's in it?

Original comment by jfkth...@gmail.com on 12 May 2009 at 12:07

GoogleCodeExporter commented 9 years ago
Please find the attachments.

Original comment by dvg...@gmail.com on 13 May 2009 at 4:01

Attachments:

GoogleCodeExporter commented 9 years ago
The file test-amma.tex uses UTF-16LE encoding; TeXworks does not automatically
recognize this, but tries to interpret the data as UTF-8 instead.

If you add a comment (using Notepad) of the form

% !TEX encoding = UTF-16LE

at the start of the file, then it should load correctly in TeXworks. Or you can 
use
the Preferences dialog to make that the default encoding.

I guess TeXworks should probably try to auto-detect UTF-16 encoding forms, to 
avoid
this particular surprise.

However, I would *strongly* recommend using UTF-8 instead for XeTeX input 
files, as
many tools (especially in the TeX world) will not handle UTF-16 correctly.

Original comment by jfkth...@gmail.com on 13 May 2009 at 4:17

GoogleCodeExporter commented 9 years ago
In r797, the possibility to reload the current document with a different 
encoding than the one detected automatically. (Just right-click on the encoding 
label in the status bar, select the encoding you want to use form the menu, 
then repeat and select "Reload using selected encoding")
That way, it is possible to work with files with non-UTF8 encoding without 
having to resort to other programs. (It is strongly recommended to add an 
encoding modline immediately so the proper encoding is picked up when using the 
same file in the future).

This still doesn't auto-detect non-UTF8 encodings, though; you either have to 
know/guess the proper encoding or try several items from the menu.

Original comment by st.loeffler on 14 Apr 2011 at 6:34

GoogleCodeExporter commented 9 years ago
Maybe it would be possible to reuse encoding detection code from 
(GPLv2-licensed) Notepad++ (http://notepad-plus-plus.org)?

Original comment by robert.p...@mykolab.com on 7 Oct 2014 at 7:47

GoogleCodeExporter commented 9 years ago
@st.loeffler How should the encoding modline look?
Setting \usepackage[latin1]{inputenc} and reloading does not work.
Also, changing the encoding via the bottom-right status bar menu does not 
change anything for me.
The only thing that works is changing the general encoding setting to 
ISO-8859-1. However I not want to use this because I need to edit files with 
diverse encodings.

I am using TeXworks 0.4.4 r.1003, installed via basic-miktex-2.9.4757.exe.

Original comment by robert.p...@mykolab.com on 7 Oct 2014 at 8:03

GoogleCodeExporter commented 9 years ago
Regarding reusing encoding detection code: yes, this is definitely on the 
roadmap.

Regarding the encoding: as stated in comment #4, after selecting an encoding 
via the bottom-right status bar menu, you also need to click on "Reload using 
selected encoding" in the same menu (just selecting it does not reload the 
document automatically, but just sets the encoding for _future_ operations 
(e.g., to convert to a new encoding while saving).

Regarding the modline: it should look like the following (described on 
https://code.google.com/p/texworks/wiki/TipsAndTricks#Setting_the_file_encoding_
per_file):

% !TEX encoding = latin1

The \usepackage directive is only for LaTeX, but encoding detection has to 
happen at a very early stage of loading a file, long before any TeX specific 
things are considered.

Original comment by st.loeffler on 9 Oct 2014 at 5:11

GoogleCodeExporter commented 9 years ago
Thank you very much for this thorough explanation of workarounds!

Original comment by robert.p...@mykolab.com on 10 Oct 2014 at 6:35