cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.07k stars 4.28k forks source link

Discussion: Aligning Python Code Formatting with PEP8 vs. Readability Preferences #45706

Open Benedikttk opened 1 month ago

Benedikttk commented 1 month ago

As previously discussed in Core Software meeting on 30/Jul/2024 (indico link), we would like to address the issue of the Python formatting style. The tool we are using for formatting is Ruff and it enforces PEP8 formatting.

We have seen some pieces of code formatted by the authors for readability in a specific way. For example, in Validation/SiTrackerPhase2V/python/Phase2TrackerValidateDigi_cff.py:

    TOFRMapH = cms.PSet(
        Nxbins = cms.int32(1200),
        xmin   = cms.double(0.),
        xmax   = cms.double(120.),
        Nybins = cms.int32(100),
        ymin   = cms.double(0.),
        ymax   = cms.double(50.),
        switch = cms.bool(False)
    )

In this case, Ruff reports that all those white spaces do not comply with PEP8 and they will be automatically changed to:

    TOFRMapH = cms.PSet(
        Nxbins=cms.int32(1200),
        xmin=cms.double(0.),
        xmax=cms.double(120.),
        Nybins=cms.int32(100),
        ymin=cms.double(0.),
        ymax=cms.double(50.),
        switch=cms.bool(False)
    )

We can find another example of readability vs formatting in GeometryReaders/XMLIdealGeometryESSource/test/testReadXMLFromDB.py, where the author aligned the input parameters as follows:

process.MessageLogger = cms.Service("MessageLogger",
cerr = cms.untracked.PSet(
    enable = cms.untracked.bool(False)
),
debugModules = cms.untracked.vstring('*'),
files = cms.untracked.PSet(
    readDBdebug = cms.untracked.PSet(
        DEBUG = cms.untracked.PSet(
            limit = cms.untracked.int32(-1)
        ),
        INFO = cms.untracked.PSet(
            limit = cms.untracked.int32(-1)
        ),
        extension = cms.untracked.string('.out'),
        noLineBreaks = cms.untracked.bool(True),
        threshold = cms.untracked.string('DEBUG')
    ),
    readDBerrors = cms.untracked.PSet(
        extension = cms.untracked.string('.out'),
        threshold = cms.untracked.string('ERROR')
    )
)

This will be reformated to:

process.MessageLogger = cms.Service(
    "MessageLogger",
    cerr=cms.untracked.PSet(enable=cms.untracked.bool(False)),
    debugModules=cms.untracked.vstring("*"),
    files=cms.untracked.PSet(
        readDBdebug=cms.untracked.PSet(
            DEBUG=cms.untracked.PSet(limit=cms.untracked.int32(-1)),
            INFO=cms.untracked.PSet(limit=cms.untracked.int32(-1)),
            extension=cms.untracked.string(".out"),
            noLineBreaks=cms.untracked.bool(True),
            threshold=cms.untracked.string("DEBUG"),
        ),
        readDBerrors=cms.untracked.PSet(
            extension=cms.untracked.string(".out"),
            threshold=cms.untracked.string("ERROR"),
        ),
     ),

Some people prefer to maintain readability in their code, which may not always align with PEP8. Currently, it is not possible to selectively retain or discard specific formatting elements to accommodate individual coding styles. So we can either format all white spaces according to PEP8 or none.

We are opening this issue seek your feedback on this. I am open to suggestions to progressively format the Python code in CMSSW without disturbing the contributors.

Thank you

cmsbuild commented 1 month ago

cms-bot internal usage

cmsbuild commented 1 month ago

A new Issue was created by @Benedikttk.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

Dr15Jones commented 1 month ago

So as food for thought, the very beginning of PEP 8 says when NOT to follow its convention

Some other good reasons to ignore a particular guideline:

1. When applying the guideline would make the code less readable, even for someone who is used to reading code that follows this PEP.
2. To be consistent with surrounding code that also breaks it (maybe for historic reasons) – although this is also an opportunity to clean up someone else’s mess (in true XP style).
3. Because the code in question predates the introduction of the guideline and there is no other reason to be modifying that code.
4. When the code needs to remain compatible with older versions of Python that don’t support the feature recommended by the style guide.
Dr15Jones commented 1 month ago

So looking at examples in the PEP, it looks like the formatter is not actually obeying the rules. In the examples in the PEP it has

# Correct:
x = 1
y = 2
long_variable = 3

and explicitly says

Always surround these binary operators with a single space on either side: assignment (=), augmented assignment (+=, -= etc.), comparisons (==, <, >, !=, <>, <=, >=, in, not in, is, is not), Booleans (and, or, not).

which the examples you gave from this formatter does not do.

makortel commented 1 month ago

assign core

cmsbuild commented 1 month ago

New categories assigned: core

@Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

makortel commented 1 month ago

explicitly says

Always surround these binary operators with a single space on either side: assignment (=), augmented assignment (+=, -= etc.), comparisons (==, <, >, !=, <>, <=, >=, in, not in, is, is not), Booleans (and, or, not).

which the examples you gave from this formatter does not do.

PEP8 lists an exception to that later (or, they don't count function arguments as "assignment", or something)

Don’t use spaces around the = sign when used to indicate a keyword argument, or when used to indicate a default value for an unannotated function parameter:

https://peps.python.org/pep-0008/#other-recommendations

Personally I would prefer to have spaces around = in our configuration files, at least in most places.

makortel commented 1 month ago

I suppose we need to tag @cms-sw/all-l2 to gather feedback.

smuzaffar commented 1 month ago

Personally I would prefer to have spaces around = in our configuration files, at least in most places.

+1

fwyzard commented 1 month ago

Both examples of PEP8 formatting are hideous.

fwyzard commented 1 month ago

this:

process.MessageLogger = cms.Service("MessageLogger",
cerr = cms.untracked.PSet(
    enable = cms.untracked.bool(False)
),
debugModules = cms.untracked.vstring('*'),
files = cms.untracked.PSet(
    readDBdebug = cms.untracked.PSet(
        DEBUG = cms.untracked.PSet(
            limit = cms.untracked.int32(-1)
        ),
        INFO = cms.untracked.PSet(
            limit = cms.untracked.int32(-1)
        ),
        extension = cms.untracked.string('.out'),
        noLineBreaks = cms.untracked.bool(True),
        threshold = cms.untracked.string('DEBUG')
    ),
    readDBerrors = cms.untracked.PSet(
        extension = cms.untracked.string('.out'),
        threshold = cms.untracked.string('ERROR')
    )
)

is actually missing the closing ), and should be formatted as

process.MessageLogger = cms.Service("MessageLogger",
    cerr = cms.untracked.PSet(
        enable = cms.untracked.bool(False)
    ),
    debugModules = cms.untracked.vstring('*'),
    files = cms.untracked.PSet(
        readDBdebug = cms.untracked.PSet(
            DEBUG = cms.untracked.PSet(
                limit = cms.untracked.int32(-1)
            ),
            INFO = cms.untracked.PSet(
                limit = cms.untracked.int32(-1)
            ),
            extension = cms.untracked.string('.out'),
            noLineBreaks = cms.untracked.bool(True),
            threshold = cms.untracked.string('DEBUG')
        ),
        readDBerrors = cms.untracked.PSet(
            extension = cms.untracked.string('.out'),
            threshold = cms.untracked.string('ERROR')
        )
    )
)
fwyzard commented 1 month ago

This:

    TOFRMapH = cms.PSet(
        Nxbins = cms.int32(1200),
        xmin   = cms.double(0.),
        xmax   = cms.double(120.),
        Nybins = cms.int32(100),
        ymin   = cms.double(0.),
        ymax   = cms.double(50.),
        switch = cms.bool(False)
    )

I understand the rationale, however most cases will have parameters of very different lengths, making it difficult to apply.

I guess if we want to have consistent rules, we would go with

    TOFRMapH = cms.PSet(
        Nxbins = cms.int32(1200),
        xmin = cms.double(0.),
        xmax = cms.double(120.),
        Nybins = cms.int32(100),
        ymin = cms.double(0.),
        ymax = cms.double(50.),
        switch = cms.bool(False)
    )
fwyzard commented 1 month ago

Some more comments...

fwyzard commented 1 month ago

An empty PSet feels right on a single line:

    parameter = cms.PSet()

A PSet with more than one parameter feels right on multiple lines:

    parameter = cms.PSet(
        flag = cms.bool(False),
        answer = cms.int32(42)
    )

What about a PSet with a single parameter ? I would prefer it on multiple lines, like

    parameter = cms.PSet(
        nice = cms.bool(True)
    )

but I've seen in many place all on a single line:

    parameter = cms.PSet( ugly = cms.bool(True) ) 

though I'm not sure about the whitespaces.

fwyzard commented 1 month ago
smuzaffar commented 1 month ago

If I am not wrong , there are few thing which we can configure and also we can enable/disble some of the pycodestyle https://pycodestyle.pycqa.org/en/latest/intro.html#configuration

makortel commented 1 month ago
  • In general I find it more convenient to keep a trailing , after the last argument: This makes it easier to rearrange lines, and add a line after the end without introducing the extra , which makes the diffs look nicer.

At least here PEP8 seems to agree to have the trailing comma, except when placed on a single line with the closing delimeter

When trailing commas are redundant, they are often helpful when a version control system is used, when a list of values, arguments or imported items is expected to be extended over time. The pattern is to put each value (etc.) on a line by itself, always adding a trailing comma, and add the close parenthesis/bracket/brace on the next line. However it does not make sense to have a trailing comma on the same line as the closing delimiter (except in the above case of singleton tuples):

https://peps.python.org/pep-0008/#when-to-use-trailing-commas

davidlange6 commented 1 month ago

another (currently more popular) formatter option is https://github.com/psf/black

Benedikttk commented 1 month ago
  • Is there a maximum line length ?

Yes hello, Ruff enforces a limit of 88 characters, this is configurable via the line-length setting in the ruff.toml file.

Benedikttk commented 1 month ago

another (currently more popular) formatter option is https://github.com/psf/black

Yes, black is also very good, especially for solo purposes. We did look into black, although Ruff is supposedly faster in large repositories than its counterparts Black, so due to that and other things we went with Ruff.

Ruff: Linting and Formatting Black: Formatting

davidlange6 commented 1 month ago

(I missed Ruff at the top - i mean to refer to pycodestyle..)