dflook / python-minifier

Transform Python source code into its most compact representation
MIT License
558 stars 41 forks source link

`__doc__` attribute references disabling all removal of literals #11

Open jplloyd opened 4 years ago

jplloyd commented 4 years ago

When running pyminify --remove-literal-statements file.py, any reference to __doc__ anywhere in the processed module will disable removal of any literal.

The documentation states (emphasis mine):

"If the module uses the __doc__ name the module docstring will be retained"

Should the documentation be changed to clarify that no literals will be removed, or is the current behavior not intended?

Another thing to note is that only references to attributes named __doc__ will disable the removal, meaning that e.g.

"""module docstring"""

print(__doc__ + " surely no sane person does this?")

will strip the docstring and make the resulting code fail. This is a very contrived example, of course, and I hope it doesn't actually exist anywhere (but it probably does).

It may be that this issue should be split up into multiple issues, depending on the intended behavior, apologies in advance.


Side note: does it make sense to disable literal removal due to references to any attribute named __doc__? For all we know the reference may be to the docstring of something that is defined in another module, which itself had its docstrings stripped.

It seems to me that since docstring removal in the general case comes with some risks, and should be done with care (even if most of the time it won't have any practical impact), there may be some value in providing an additional flag that disables the __doc__ check entirely (with appropriately severe warnings to the user).

dflook commented 4 years ago

Thanks for creating an issue! The documentation is correct that __doc__ should only retain the module docstring - this isn't working and is a bug.

In addition to that, any usage of a __doc__ attribute disables all literal statement removal, which is intended but not documented. This is a documentation defect. The reason for this is to minimise the possibility of broken output. This is something I actually ran into a lot when testing python minifier.

I'm thinking about replacing --remove-literal-statements with separate options for:

jplloyd commented 4 years ago

I'm thinking about replacing --remove-literal-statements with separate options for:

* removing non-docstring literal statements

* removing module docstring (always or suppressed by `__doc__` name)

* removing other docstrings (always or suppressed by `__doc__` attribute.

Sounds sensible. Ultimately docstring removal doesn't really fall inside the scope of minification (in python), since it can change execution semantics, so making users have to explicitly enable the potential self-foot-shooting should be the way to go.