apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.23k stars 14.08k forks source link

Enable More PyDocStyle Checks #10742

Closed kaxil closed 2 months ago

kaxil commented 4 years ago

We use PyDocStyle in pre-commit to enforce docstring style

We use Ruff to enforce docsting style https://github.com/apache/airflow/blob/23b8e839a35e84d57c5cf38b0b21171ac3bd1ecc/pyproject.toml#L47-L74

We follow pep257 style (http://www.pydocstyle.org/en/stable/error_codes.html) for checks.

Currently, we ignore the following rules:

The task is to complete the following missing rules:

Missing Docstrings

Whitespace Issues

Docstring Content Issues

It would be good if we can enable them one by one -- separate PRs are ok

Let me know if you need any help

pcandoalmeida commented 3 years ago

I've taken:

Might be an idea @kaxil to add a checklist for all rules needing updating in the initial issue comment. This way potential contributors can know which ones are done and which are still open to do? Just a quality of life idea so we don't have to look at the PRs individually.

kaxil commented 3 years ago

@pcandoalmeida That's a good idea, I updated the description and added a check-box.

pcandoalmeida commented 3 years ago

Hi @kaxil I'm working on D202 now.

pcandoalmeida commented 3 years ago

Morning @kaxil what do you think about adding the Hacktoberfest label to this issue?

kaxil commented 3 years ago

Morning @kaxil what do you think about adding the Hacktoberfest label to this issue?

Good idea again, just added :)

ktrueda commented 3 years ago

Hi I wanna try D205. Can I try this ?

ktrueda commented 3 years ago

I started to fix D205 (1 blank line required between summary line and description). It's difficult because a lot of docs has not summary line.

For example, https://github.com/apache/airflow/blob/d86cf37a35b69fea806be7610deb332468dfba4b/airflow/configuration.py#L481-L486

This docs has no summary line and there are a lot of docs like this. I cannot make summary line for all these docs. If I deal all docs as summary line, the doc obey D205. But it is not easy to read.

How can I do that? If someone has any idea, please help me.

kaxil commented 3 years ago

@ktrueda You can do 2 things:

1) Add a summary line:

Remove an option if it exists in config from a file or 
default config. 

If both of config have the same option, this removes 
the option in both configs unless remove_default=False.

2) Only update blank line for which docstrings exists. In this case, we won't be able to enable D205 for all of them but when in future we have summary lines, it would make the word easier.

I would prefer (1) though

ktrueda commented 3 years ago

@kaxil Thank you for your help. I would prefer (1) too. So I'll try that way.

Since I found thousands of docs to fix, I need much times. Leave it to me.

potix2 commented 3 years ago

@kaxil Can I take D200?

pcandoalmeida commented 3 years ago

Hiya 👋🏼 @potix2 I believe D200 is currently being worked on 😃

potix2 commented 3 years ago

@pcandoalmeida Yeah, I'm working on it. I'll send a pull request soon.

potix2 commented 3 years ago

I'm working on D400.

peter0083 commented 2 years ago

I would like to tackle this issue if it hasn't been assigned recently. Thanks! @uranusjr :)

uranusjr commented 2 years ago

Go ahead 👍 Note that the ignore list above is out of date; please find the correct list in the pre-commit config file on main.

peter0083 commented 2 years ago

status update:

many thanks!

uranusjr commented 2 years ago

Maybe try docformatter? You’d probably want to do this in small batches, maybe even one file at a time, to see how well it works.

peter0083 commented 2 years ago

Maybe try docformatter? You’d probably want to do this in small batches, maybe even one file at a time, to see how well it works.

sounds good. I will give docformatter a shot. thank you for the tips. @uranusjr. 🙏

eladkal commented 2 years ago

I'm working on D400.

hi @potix2 Do still wish to work on it?

potix2 commented 2 years ago

@eladkal I gave up D400, because I got too many conflicts. I'm sorry for late reply.

peter0083 commented 2 years ago

@eladkal sorry I give up on D205. I will let other developers tackle this one 😞

edithturn commented 2 years ago

@eladkal, is somebody working here: D400(First line should end with a period)? If not, I'd like to give it a try.

eladkal commented 2 years ago

feel free!

edithturn commented 2 years ago

@eladkal @kaxil I took D400, there are a lot of files there 😄 . My question is in this part where we have a long first line

Screen Shot 2022-06-30 at 18 33 52

Pre commits (D400) is asking to add a period at the end of the line, It is not the case, I think. Should I just remove the new line and convert it in a single line to avoid have that issue?

uranusjr commented 2 years ago

Simply removing the newline would trigger another check complaining the line is too long. It would be best to rewrite it and split the sentence into two shorter ones. This particular exampel can be rewritten to

Check metric values are within a certain tolerance.

The metrics are given as SQL expressions, and the tolerance with parameter *days_back*.
potiuk commented 2 years ago

Simply removing the newline would trigger another check complaining the line is too long. It would be best to rewrite it and split the sentence into two shorter ones. This particular exampel can be rewritten to

Check metric values are within a certain tolerance.

The metrics are given as SQL expressions, and the tolerance with parameter *days_back*.

Hard to automate I am a afraid :(

edithturn commented 2 years ago

@potiuk @uranusjr I counted all the changes that are needed for D400, they are 3150 changes. Finish this in a single PR could be hard. I got an idea. How about to keep ignoring D400 in pre-commits - --add-ignore=D100,D102,D103,D104,D105,D107,D202,D400,D401,D205 and still working in minitaks (100 or 200 files) and send PR fixing what @uranusjr is saying? I could make progress in this way. What do you think?

mik-laj commented 2 years ago

@edithturn SGTM. The smaller the changes are, the easier it will be to review and merge.

potiuk commented 2 years ago

Yep

edithturn commented 2 years ago

Thanks, it is better!

bdsoha commented 1 year ago

@edithturn Still working on D400?

eladkal commented 1 year ago

@bdsoha D400 is huge it involved many files. help is appreciated.

bdsoha commented 1 year ago

@eladkal I'm always glad to help, just want to make sure there is no overlap. Is there a way to concretely subdivide the work?

edithturn commented 1 year ago

@bdsoha thank you for asking. I found 3150 changes to make and I tried to do it by parts. If you find another way to make this easier let me know so I can also jump in. Appreciate it 🚀

bdsoha commented 1 year ago

@bdsoha thank you for asking. I found 3150 changes to make and I tried to do it by parts. If you find another way to make this easier let me know so I can also jump in. Appreciate it 🚀

I want to know more or less what you are currently working on so that we don't both attempt work on the same resources.

edithturn commented 1 year ago

yay! My last change was pushed :) So I am not working on this right now.
The approach that was following was: Add this validation D400 to the pre-commits, evaluate all the files, and start fixing them by parts. Before sending the PR, again remove D400 from pre-commits so that it lets us push the changes :)

Let me know if that helps!

bdsoha commented 1 year ago

@edithturn Just needed to know that you are not working on it simultaneously. Thanks!

bdsoha commented 1 year ago

@kaxil @potiuk @eladkal Once the above related PR are merged, it might be worth it to add the following to guarantee conforming to styles on already handled directories:

# airflow/.pre-commit-config.yaml
  - repo: https://github.com/pycqa/pydocstyle
    rev: 6.1.1
    hooks:
      - id: pydocstyle
        name: Run pydocstyle
        args:
          - --convention=pep257
          - --add-ignore=D100,D102,D103,D104,D105,D107,D205,D400,D401
        include: |
          ^airflow/api.*\.py$|
          ^airflow/api_connexion.*\.py$|
          ^airflow/callbacks.*\.py$|
          ^airflow/cli.*\.py$|
          ^airflow/datasets.*\.py$|
          ^airflow/decorators.*\.py$|
          ^airflow/exceptions.py$|
          ^airflow/hooks.*\.py$|
          ^airflow/lineage.*\.py$|
          ^airflow/listeners.*\.py$|
          ^airflow/macros.*\.py$|
          ^airflow/security.*\.py$|
          ^airflow/sensors.*\.py$|
          ^airflow/tasks.*\.py$|
          ^airflow/timetables.*\.py$|
          ^airflow/triggers.*\.py$
kaxil commented 1 year ago

@bdsoha Can you create a PR with your suggestion

vincbeck commented 1 year ago

You mark D400 as done

eladkal commented 1 year ago

Given that https://github.com/apache/airflow/pull/28893 removed pydocstyle in favor of ruff Do we have a way to enforce styles? D400 is completed but we have no way to make sure new code is comply with D400 convention.

vincbeck commented 1 year ago

D400 is completed but we have no way to make sure new code is comply with D400 convention.

We do. It is part of static checks, if your changes do not comply with D400 convention, static check will fail

ferruzzi commented 1 year ago

There are almost a thousand D205 violations, I've taken a bite out of that. 59 down, 878 to go! :stuck_out_tongue:

potiuk commented 1 year ago

Let's hope they go down faster than new are coming :D

ferruzzi commented 1 year ago

Most are easy. Every docstring must start with a one-line summary which ends in a period. If there is more than one sentence then there must be a blank line after the first. It gets a little tricky for docstrings like Does the thing. If it fails then it does this. because you have to merge that into one sentence which fits on one line or rephrase it and add a gap. Also, sometimes the docstring IS one sentence but its a long sentence so you either have to rephrase to shorten it or come up with a way that it has a summary and a details block without looking too redundant and silly. :shrug:

It just takes some time.

ferruzzi commented 1 year ago

After the above batch of five commits get merged we can throw the switch on D205! Locally with all of those applied, I show zero D205 violations in the codebase.

Starting on D401. There are over 2000 violations on that one so it'll be the same many-small-bites process.

ferruzzi commented 1 year ago

D205 is done :banandance:

D401 is about half done, hoping to finish it this week if Life cooperates.

eladkal commented 7 months ago

@ferruzzi was D401 completed?

Taragolis commented 7 months ago

I guess not yet, at least if I uncomment this line https://github.com/apache/airflow/blob/f24a03709eecbda87ed794cee567806e51c3a21f/pyproject.toml#L1276

And run check manually I've got an error:

Run 'ruff' for extremely fast Python linting.............................Failed
- hook id: ruff
- exit code: 1

airflow/auth/managers/utils/fab.py:43:5: D401 First line of docstring should be in imperative mood: "Returns the map associating a method to a FAB action."
airflow/auth/managers/utils/fab.py:48:5: D401 First line of docstring should be in imperative mood: "Returns the map associating a FAB action to a method."

... (truncated about 1000 lines)

airflow/www/auth.py:90:5: D401 First line of docstring should be in imperative mood: "This decorator is used to check permissions on views."
airflow/www/auth.py:348:5: D401 First line of docstring should be in imperative mood: "Decorator that checks current user's permissions to access the website."
airflow/www/blueprints.py:27:5: D401 First line of docstring should be in imperative mood: "Main Airflow page."
airflow/www/views.py:4067:9: D401 First line of docstring should be in imperative mood: "Action method to handle multiple records selected from a list view."
Found 1084 errors.

I assume it is pretty hard to implement it until https://github.com/astral-sh/ruff/issues/3172 not implemented.

  1. Someone fix doc strings into the entire module/components
  2. Couple days after someone add new docstring which not follow D401
  3. 😢

As solution for that we might enable D401 by default in ruff and add current files/module which not yet follow D401 into the tool.ruff.per-file-ignores. WDYT?

eladkal commented 7 months ago

As solution for that we might enable D401 by default in ruff and add current files/module which not yet follow D401 into the tool.ruff.per-file-ignores. WDYT?

Probably this is the best way to avoid pile up more work.