CadQuery / CQ-editor

CadQuery GUI editor based on PyQT
Apache License 2.0
725 stars 112 forks source link

cq-editor is not using UTF-8 as a default on File | Save #276

Closed carribeiro closed 2 years ago

carribeiro commented 3 years ago

I'm still trying to figure out exactly what is happening, but here's a start. I live in Brazil and we use non-ASCII characters in the source files. One such example is the word "Atenção" which means "Attention".

# Test 2 - with unicode chars
print("Test 2 - Atenção")

When I create the file above in cq-editor, it is saved as an ISO-8859 text file. If I try to open in in Visual Studio Code, the characters are garbled:

image

If I create the same file directly on VSC, it shows up fine:

image

Checking the files on the filesystem, the encoding is different:

image

The file size is also different:

image

Now, if I try to open the file that it was created VSC on cq-editor, it opens... but as soon as I save it on cq-editor, the unicode chars are garbled again.

image

Now, if I set cq-editor to autoreload, and edit the file on VSC, it shows up fine on cq-editor (including the non-ASCII characters). But as soon as I save it on cq-editor, it is mangled on VSC; and if I save the file with the garbled characters on VSC, then it breaks the characters on cq-editor.

carribeiro commented 3 years ago

Updated the title, seems to me that cq-editor is not saving the files using UTF-8 (which I believe it should be doing).

carribeiro commented 3 years ago

Trying to figure out what's going on. The save command is implemented on editor.py (https://github.com/CadQuery/CQ-editor/blob/master/cq_editor/widgets/editor.py), line 162:

            with open(self._filename,'w') as f:
                f.write(self.toPlainText())

Seems that self.toPlainText() doesn't generate the correct Unicode representation for file save. The code comes from Spyder, and I'm still trying to figure out which would be the correct way of saving files using Spyder's API (the code base is a lot bigger and I'm still trying to understand how the editor is structured).

adam-urbanczyk commented 3 years ago

I cannot reproduce this on linux. Could you report what you get in the console when executing this:

self.components['editor'].toPlainText()
adam-urbanczyk commented 3 years ago

@carribeiro what does import locale; locale.getpreferredencoding() produce? This should be the encoding used by open.

carribeiro commented 3 years ago

I cannot reproduce it either on OSX. File is saved as UTF-8:

(cq) MacBook-Pro:cadquery cribeiro$ file *
test3.py: UTF-8 Unicode text

The test suggested above on OSX returns the following:

In[1]: self.components['editor'].toPlainText()
Out[1]: 's = "atenção"\n'

The test also returns a similar result on Windows:

In[5]: self.components['editor'].toPlainText()
Out[5]: 's = "atenção"\n'

The problem may be related to the way the file is opened for saving; perhaps it's not opening as UTF-8 as it should on Windows (something like open(filename, 'r', encoding='utf8') - but obviously I'm oversimplifying).

carribeiro commented 3 years ago

@carribeiro what does import locale; locale.getpreferredencoding() produce? This should be the encoding used by open.

I think you nailed it.

import locale; locale.getpreferredencoding()
Out[6]: 'cp1252'

However, neither IDLE or VSC use cp1252; both seem to default to UTF-8 anyway. I guess that's exactly to guarantee interoperability.

carribeiro commented 3 years ago

Did a quick search and it seems that it's recommended to always save files using UTF-8. Not a definitive source though, just some discussions on developer forums. Maybe someone on the main Python groups could answer that authoritatively.

carribeiro commented 3 years ago

One discussion that I've found: http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html (it's pretty old but it goes over a lot of the issues on the transition of Unicode encoding in Python and strategies for compatibility)

adam-urbanczyk commented 3 years ago

Well, I'm using the current Python default. Can't you configure Windows to use utf8 in the locale?