Open thinkdoggie opened 4 years ago
The output from python-minifier is always UTF-8, so the coding comment is not needed. An option to preserve the shebang would be sensible.
Hi @dflook, thanks for your reviewing. Even the encoding for the output has been UTF-8, Python 2 interpreter still need this magic comment to load code correctly if non-latin characters used in the content. Because,
Python 2 uses ASCII as its default encoding for its strings which cannot store Chinese characters. On the other hand, Python 3 uses Unicode encoding for its strings by default which can store Chinese characters.
Here're some reference links about this topic:
Hi @dflook, do you think shebang or encoding hint should be preserved by default?
If not, the minified program will not be fully equivalent to the original source. For example,
On the other hand, I also suggest to provide the following options in the command-line to suppress this behavior: --remove-shebang
and --remove-encoding-comment
. Thanks!
Hi, thanks for continuing to work on this!
The encoding comment is an instruction for the parser. Since we work on the AST produced by the parser, the coding comment in the source is no longer relevant. In Python 3 we can safely output UTF-8 and rely on the parser to produce the same string.
The sequence of bytes might be different between input and output if the encoding is different, but the parser should produce the same string. In this case copying the coding comment from the input to output is wrong, as it will probably not match the output encoding. If using the pyminify
command the output is UTF-8. When using the minify()
function you get a python str that you can encode how you like, but you will need to add a coding comment yourself to ensure it parses correctly.
I'm not too sure of how this works in Python 2 to be honest, but I'm reluctant to make any changes without a test case. Do you have an example where the current behaviour is wrong?
For the shebang I can't quite decide if it should be removed or preserved by default, but --remove-shebang
is along the right lines.
Hi @dflook , I totally agree that test cases are always necessary for any code changes. Nevertheless, it looks no existing test scaffold for the minify()
function. If I understand correctly, I am willing to add more test code for this new enhancement.
The magic comment for encoding is just a hit for Python 2 interpreter rather than the file encoding itself. There is a very old spec PEP263 to address this topic. I will show more examples and negative cases to explain how it works and why it is necessary for Python 2.
How is it going with this Pull Request? I ran into the issue that my shebang gets removed from the output file (https://github.com/dflook/python-minifier/issues/34). This pull request has been stale for a year and a half. What is the status?
Shebang and encoding declare are two types of magic comments in the beginning lines of Python source code. In general, the minifier had better keep these lines as the origin source.
For example,