Handle PEP 263 Source Encoding Lines

corranwebster commented 4 years ago

PEP 263 defines a way for Python source code files to declare their encoding by having a special comment which must occur on the first or second line of the file.

For example

# -*- coding: latin-1 -*-

# (C) Copyright 2005-2020 Enthought, Inc., Austin, TX
# All rights reserved.
#
# This software is provided without warranty under the terms of the BSD
# license included in LICENSE.txt and may be redistributed only under
# the conditions described in the aforementioned license. The license
# is also available online at http://www.enthought.com/licenses/BSD.txt
#
# Thanks for using Enthought open source!

gives an error "H103 Wrong copyright header found".

A quick search of the codebase shows that we do have these files. Many of them are autogenerated Sphinx conf.py files, but there are some legitimate matches. All of them are requiring UTF-8, and so can be removed now that that is the default encoding for Python 3 and the codebase is Python 3 only.

From PEP 263, the regex to match the line is: ^[ \t\f]*#.*?coding[:=][ \t]*([-_.a-zA-Z0-9]+)

corranwebster commented 4 years ago

I'm adding this issue mainly so we can reject it: I think the correct solution is that we always use UTF-8 and until such time as we need to have a source file with a different encoding, I think we can ignore this possibility.

mdickinson commented 4 years ago

I think the correct solution is that we always use UTF-8 and until such time as we need to have a source file with a different encoding, I think we can ignore this possibility.

Agreed.

enthought / ets-copyright-checker

Handle PEP 263 Source Encoding Lines #18