apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.72k stars 14.22k forks source link

Add Sphinx Spell-Checker #10264

Closed kaxil closed 4 years ago

kaxil commented 4 years ago

We have been fixing various typos in the project but it would be good if we can enable a Spell Checker for our Docsite. So our docs are typo-free.

We can use https://pypi.org/project/sphinxcontrib-spelling/ to do this. Docs: https://sphinxcontrib-spelling.readthedocs.io/en/latest/

I gave it a shot but unfortunately, there are too many words that we need to add in docs/spelling_wordlist.txt.

Here is the list I had but there are still ~6k more words that need to be added, if someone wants to give it a shot:

Acyclic
Airbnb
Async
Avro
Bas
BaseView
Cassanda
DagRun
Dask
Dataproc
Datastore
Gantt
Gunicorn
Harenslak
Hashicorp
Jarek
Jinja
Jira
Kamil
Kerberos
Kibana
Kubernetes
Oozie
Opsgenie
Parameterizing
Potiuk
Py
Qubole
Sqoop
Standarization
Systemd
Templating
XCom
XComs
Zsh
adls
airflow
airflowignore
ansible
apikey
argcomplete
args
async
auth
autocommit
autodetect
automl
autoscale
aws
backend
backfill
backfilled
bashcompinit
batcher
bigquery
bigtable
bitshift
boto
botocore
catchup
cfg
chown
classmethod
cloudant
cloudsql
cncf
config
configMapRef
configmap
configuing
cronjob
crypto
cyexamplekey
dag
dagbag
dagruns
databricks
datadog
dataset
datasets
datetime
dbs
dejson
deserializing
dest
dev
devel
dingding
distros
dockerenv
docstring
docstrings
elasticsearch
envFrom
eventlet
exampleinclude
exasol
facebook
failover
fernet
fluentd
fs
gRPC
gcp
gcpcloudsql
gevent
github
greenlets
grpc
gssapi
hadoop
hashicorp
hdfs
hiveserver
howto
httpbin
imap
initdb
integration
integrations
jalr
jdbc
jinja
keytab
krb
kubernetes
kwargs
kylin
licence
literalinclude
logins
loglevel
logstash
lshift
macOS
mdeng
memorystore
mesos
metadatabase
metarouter
metastore
mongo
msg
mssql
noqa
odbc
papermill
param
paramiko
petabyte
pgdatabase
pghost
pgpassfile
pgpassword
pgport
pguser
pidfile
pinot
postgre
postgres
postgresql
precheck
proc
programmatically
psql
py
pylint
pythonpath
rankdir
rbac
readthedocs
resetdb
rshift
rst
salesforce
saml
sanitization
searchpath
secretRef
secretsmanager
seealso
serverless
sftp
smtps
spegno
sqla
stackdriver
statsd
stdout
subcommand
subdag
subgraph
subpackage
subpackages
subprocesses
sudo
tablename
templated
templating
teradata
timedelta
umask
unpause
upgradedb
upsert
uptime
utcnow
versionable
vertica
wasb
webhdfs
webserver
xcom
xxxxxxxx
yandex
yandexcloud
potiuk commented 4 years ago

Love it. I make a lot of spelling mistakes (usually because I want to go fast). So having an automated check would be awesome!

potiuk commented 4 years ago

BTW. I got a lot better at it with recent changes in IntelliJ where spell-checking and grammar is built in :)

caddac commented 4 years ago

I'd like to work on this issue. Hoping to have a PR in the next day or so.

caddac commented 4 years ago

I've created an initial PR for this and have a few questions about next steps. Spell check is run during docs build, but there is no direct command to run the spell check. Also, all misspellings have been added to the docs/spellling_wordlist.txt file and some effort is needed to identify misspellings, remove them from the wordlist and update the docs with the correct spellings.

My questions:

  1. Should we have a specific command to run spell check other than building the docs? in breeze too?
  2. Should I my PR to update the spellings for as many of the misspellings as I can? It will boost the number of files I change, obviously.
kaxil commented 4 years ago
  1. Should we have a specific command to run spell check other than building the docs? in breeze too?

I took a look at your PR and I think running the spell check as part of the doc only makes sense for now. We don't need a separate command for it. Even if we would want to do it, it can happen in a separate PR :)

  1. Should I my PR to update the spellings for as many of the misspellings as I can? It will boost the number of files I change, obviously.

Yes, otherwise the PR would fail so atleast I would expect it to fail.