HERA-Team / hera_pspec

HERA power spectrum estimation code and data formats
http://hera-pspec.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

Pre commit #359

Open steven-murray opened 2 years ago

steven-murray commented 2 years ago

This adds pre-commit and many style checks, in particular flake8 and black.

Looking through all the changes will be a bit laborious... you might want to have a skim through to make sure the overall feel is what you want. If there's any changes that seem like style you don't want, I can change it back.

I have left a few style checks off for now (there were just too many errors and I didn't want to deal with them all at once). They can be turned on one at a time in other PRs.

I also found a few things that seemed like actual errors (small things like missing assertions).

nkern commented 2 years ago

thanks SM for taking this up. it is a lot of changes.

I may be in the minority opinion here, but I am not very fond of this kind of style

        ps = ds.pspec(
            [
                bls,
            ],
            [
                bls,
            ],

that is prevalent in the black reformatting. I actually find it incredibly more hard to read, debug, and edit than

ds.pspec([bls], [bls])

but maybe that's just something I need to learn to get over. regardless, my vote would be to either remove black or turn off this kind of style, but of course, if I'm in the minority of heavy-user hera_pspec devs, I'm fine w/ it being added. lmk if anyone has thoughts

jsdillon commented 2 years ago

FWIW, I also find that kind of choice by Black to be totally baffling.

steven-murray commented 2 years ago

@nkern I agree that that is awful formatting. There is an easy way to turn that particular thing off, and I actually tried to find instances of it in this code, but obviously missed a few.

Briefly, black will format any list with a trailing comma after the last element on separate lines. Otherwise, if it doesn't have a trailing comma it will try to fit it on one line, and only break if necessary. These apparently had pre-existing trailing commas which signaled black to format it this way. I can fix these particular instances. Usually, as one writes code and black formats the little bits you write, you notice these things and can fix them easily as you go (as you get used to it, you just know ahead of time not to put the comma there for a short list/tuple etc).

steven-murray commented 2 years ago

OK @nkern I fixed up some of the most obvious examples of that kind of thing.

nkern commented 2 years ago

@steven-murray got it, thanks for the explanation!

not to nit pick here, but here is another example of a formatting issue that I (personally) think reduces clarity before

                uvp1.wgt_array[i][blp1_inds, :, :, j] \
                    = np.sqrt(1. / (1./uvp1.get_wgts(key1)**2
                            + 1. / uvp2.get_wgts(key2)**2))
                uvp1.wgt_array[i][blp1_inds, :, :, j] /= \
                    uvp1.wgt_array[i][blp1_inds, :, :, j].max()

after

                uvp1.wgt_array[i][blp1_inds, :, :, j] = np.sqrt(
                    1.0
                    / (1.0 / uvp1.get_wgts(key1) ** 2 + 1.0 / uvp2.get_wgts(key2) ** 2)
                )
                uvp1.wgt_array[i][blp1_inds, :, :, j] /= uvp1.wgt_array[i][
                    blp1_inds, :, :, j
                ].max()

maybe this is another example of an edgecase, but these are kind of scattered all over the place. it also could just be that I don't like putting brackets on their own line, which is very common for black it seems.

I guess the question for us to ask (which maybe we did and I missed it since I haven't showed up at pspec telecons lately, sorry), is do most ppl who actively work on hera_pspec (myself, phil, adrian, aew, jianrong, saurabh, adelie, hugh, and others) find that its non-black-conforming code style is a hindrance? let me know if so. or maybe this is a hera-wide software effort? I think I may just have a personal distaste for black in general, but again, if most hera_pspec devs are on board I'm cool with it. and regardless, thanks steven for all of the effort you've put into getting this where it is! maybe others can weigh in on their thoughts if they have them.

steven-murray commented 2 years ago

@nkern Thanks for thinking this through thoroughly.

With the example you list here, which changes something like this:

long_lhs_expression = \
    (also_really_long_rhs_expression_that_is 
     * multiplied_by_another)

to this:

long_lhs_expression = (
    also_really_long_rhs_expression_that_is 
     * multiplied_by_another
)

The explicit rationale is 1. always remove all backslashes because they're brittle (see here for reasons), 2. don't go over 88 characters per line (because that is harder to read and present and look at on github etc.) and 3. putting the final bracket on its own line makes it much easier to see the end of the clause.

Number (3) is especially obvious in things like this:

def function_with_many_params(
    first_param,
    second_param = "my default value",
    third_param: tuple = ('xx', 'yy'),
):
    # code starts!
    first_param * second_param
    ...

Note that if the last parenthesis of the function call was right at the end of the third_param: tuple ... line, there is no obvious place that the code starts versus listing the input parameters.

I advocate for using black based on essentially three arguments:

  1. In general, the black authors have done a LOT of work thinking about the best possible ways to represent python code to enhance readability, stay true to PEP8, and minimize the code in diffs. These are all good things and happen automatically when using black.
  2. While the style might not be exactly how you (or I) like it, it is much better to have consistency over personal code preference (in almost all cases). In effect, you just get used to how "blacked code" looks, and you end up not thinking about the formatting any more. Everything just reads easily, on any project you look at that uses it.
  3. This consistency also means it can run automatically. If you run flake8, which really is something you should be doing, because it uncovers small syntax errors and performance issues along with pure styling things, then you'll get a whole heap of style errors, which take a long time to fix up by hand. Running black gets rid of the majority of them in a single call, and does it consistently for everyone in the collaboration.

</persuasive essay>

Having said all that, I realize it's up to the people who actually develop pspec to decide whether they want to do this or not. Either way, if you'd prefer to turn black off, I'd still recommend keeping the pre-commit file with at least the flake8 checks, because I fixed some things that were almost certainly errors by doing that.

nkern commented 2 years ago

awesome, thanks for your insight steven. I think I would generally agree with all of those things in theory. I think the thing that just gets me in practice, is that I think I just personally disagree with what black sees as "more readable". For example, in your example (as with many examples I see on black docs) of something like this

def function_with_many_params(
    first_param,
    second_param = "my default value",
    third_param: tuple = ('xx', 'yy'),
):
    # code starts!
    first_param * second_param
    ...

relative to this

def function_with_many_params(first_param, second_param='my_default_value',
                              third_param:...):
    # code starts

it seems like the above is just taken for granted as "more readable" but, for whatever reason, I just don't see it that way. However, I think this is just the way I "grew-up" reading and writing python code--I probably could get used to the black style, and in this case I certainly don't think its "unreadable" per se, just harder for me. And I do agree with consistency as being a key thing here, so, I think I'm probably fine with turning a minimal flake8+black pre-commit here if everyone else is on board. but perhaps we can just get a weigh-in from some other hera_pspec devs if they have any opinions.

nkern commented 2 years ago

@steven-murray while we wait on weigh-in from others, an unrelated question:

in some areas there is the addition of a # noqa comment, which apparently is a signal to the codecoverage to not check that line. In the case of a function, does it apply to the whole function? and also did you add that or is that something that black added (and if so, why?). thanks!

Here is an example in pspecdata.py

    def I(self, key):  # noqa
steven-murray commented 2 years ago

Hey @nkern , sorry for the delay, was on vacation. So the # noqa doesn't stop coverage checks but rather flake8 checks. In the above I'm fairly sure it only applies to the function line itself. In that case it's because the function name is a single upper case letter, which is generally disapproved of. Here it's appropriate so I added the comment.

steven-murray commented 2 years ago

@acliu / @philbull do you have further comments on this PR?

steven-murray commented 1 year ago

@acliu / @philbull just pinging you to see if you have further opinions on this PR? I'll have to sort out conflicts (which are numerous), and when I do, I'd like to be able to merge right away after it, so I don't have to do it again :-)

philbull commented 1 year ago

Sorry for the long delay, I studiously ignore my GitHub notifications :/ Thanks very much for handling this Steven. Only two points of feedback:

  1. I think switching on black is contentious enough that we should switch it off for now, and perhaps only introduce it the next time we do a major release. I agree that it results in iffy code style in some instances, but am somewhat happy to let commit hooks auto-fix things for me. flake is fine though as far as I'm concerned; we can switch off annoying rules if there are issues.
  2. There are some other open PRs, so I'm worried that switching on the pre-commit stuff will make those hard to merge of we aren't careful about the order of things.