JelteF / PyLaTeX

A Python library for creating LaTeX files
https://jeltef.github.io/PyLaTeX/
MIT License
2.26k stars 286 forks source link

Unicode support #55

Closed winogradoff closed 9 years ago

winogradoff commented 9 years ago

I can't write PDF file in cyrillic.

JelteF commented 9 years ago

You should probably use lualatex or xetex as the compiler then. See #47

winogradoff commented 9 years ago

Thanks, I did not know about it)

But I still get the error: "! String contains an invalid utf-8 sequence."

I think this happens because the generated tex file is not in utf encoding. When I create a tex file myself - pdflatex encodes it normally.

JelteF commented 9 years ago

Hmm this sounds strange, are you using python 3 or python 2? And could you show a simple example that doesn't work?

winogradoff commented 9 years ago

I'm using Python 3.

Code example: https://gist.github.com/winogradoff/2544b3d3b40a98c44e53

And yes, I'm using packages argument from the pull request: https://github.com/JelteF/PyLaTeX/pull/58

JelteF commented 9 years ago

I tested some things with your snippet after downloading cyrillyc packages for latex. The key fix seems to be adding the babel package.

You can do this by doing this after initializing the Document object:

doc.packages.add(Package('babel', options='russian'))

Could you confirm this? (this should work without your new commits)

JelteF commented 9 years ago

Something else that might help if you are having trouble is setting the fontenc to T2A, T2B or T2C. See page 5 of this document for the differences http://latex-project.org/guides/cyrguide.pdf. For me however, the babel package uses one that works automatically.

winogradoff commented 9 years ago

I have created a pdf-file with the Cyrillic alphabet. Just deleted this package from my code: Package ('inputenc', options = ['utf8']) https://gist.github.com/winogradoff/2544b3d3b40a98c44e53

Your library creates a tex-file in non-unicode encoding.

It would be better to use utf-8 support like this (for Python 2 and Python 3): http://stackoverflow.com/questions/10971033/backporting-python-3-openencoding-utf-8-to-python-2

JelteF commented 9 years ago

Sorry, I might have been unclear. What I meant was code like this:

doc = Document(
    author=doc_author,
    date=current_date,
    title=doc_title,
    maketitle=True,
)
doc.packages.add(Package('babel', options=['russian']))

Since the Package('inputenc', options='utf8') is added by default the encoding should be interpreted correctly.

winogradoff commented 9 years ago

That way it doesn't work. Because the file *.tex in non-unicode encoding. It works fine until the text contains unicode characters. You should write files in unicode for wider support of encodings: http://stackoverflow.com/questions/10971033/backporting-python-3-openencoding-utf-8-to-python-2

JelteF commented 9 years ago

I really don't think that is the case. Definitely not in Python 3 at least. Could you try my code above with some Cyrillic characters added and tell me the error (and show the code)? Since it worked on my machine.

winogradoff commented 9 years ago

Try with this file: https://drive.google.com/file/d/0B3qRZwrY7kcgYm5FT0hyS0gtRWs/view?usp=sharing

And then try to encode * .tex file in utf-8 and run pdflatex from the console. Then pdflatex should make PDF-file.

winogradoff commented 9 years ago

This works fine for Python 3:

def generate_tex(self, filename=''):
    """Generates a .tex file.

        :param filename: the name of the file

        :type filename: str
    """

    filename = self.select_filename(filename)

    with open(filename + '.tex', 'w', encoding='utf-8') as newf:
        self.dump(newf)
JelteF commented 9 years ago

Your code worked fine for me. I think I know where the bug is coming from. This is what the python documentation says:

In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding. (For reading and writing raw bytes use binary mode and leave encoding unspecified.)

It probably works on my machine because my default locale is UTF8. If you send a pull request with the change I will accept it when it passes Travis. Please add it to the changelog as well as an item under Fixed.

winogradoff commented 9 years ago

Wow. Thanks for the info. I will try this week to make a pull request.

JelteF commented 9 years ago

I fixed it myself since it was such a small change.

winogradoff commented 9 years ago

Thank you.