allenai / allennlp

An open-source NLP research library, built on PyTorch.
http://www.allennlp.org
Apache License 2.0
11.75k stars 2.25k forks source link

write a CI check to make sure docstrings are in sync with constructors #1410

Open joelgrus opened 6 years ago

joelgrus commented 6 years ago

here's the raw materials, if I have time I'll put all the pieces together. you'd need to collect all the classes in the library, find the parameters both way, and make sure they're equal. there's probably lots of edge cases.

In [15]: from allennlp.models.crf_tagger import CrfTagger

In [16]: import inspect

In [17]: from numpydoc.docscrape import NumpyDocString

In [18]: docstring_params = NumpyDocString(CrfTagger.__doc__)['Parameters']

In [19]: inspect_params = inspect.getfullargspec(CrfTagger)

In [20]: docstring_params
Out[20]:
[('vocab',
  '``Vocabulary``, required',
  ['A Vocabulary, required in order to compute sizes for input/output projections.']),
 ('text_field_embedder',
  '``TextFieldEmbedder``, required',
  ['Used to embed the tokens ``TextField`` we get as input to the model.']),
 ('encoder',
  '``Seq2SeqEncoder``',
  ['The encoder that we will use in between embedding tokens and predicting output tags.']),
 ('label_namespace',
  '``str``, optional (default=``labels``)',
  ['This is needed to compute the SpanBasedF1Measure metric.',
   'Unless you did something unusual, the default value should be what you want.']),
 ('dropout:  ``float``, optional (detault=``None``)', '', []),
 ('constraint_type',
  '``str``, optional (default=``None``)',
  ['If provided, the CRF will be constrained at decoding time',
   'to produce valid labels based on the specified type (e.g. "BIO", or "BIOUL").']),
 ('include_start_end_transitions',
  '``bool``, optional (default=``True``)',
  ['Whether to include start and end transition parameters in the CRF.']),
 ('initializer',
  '``InitializerApplicator``, optional (default=``InitializerApplicator()``)',
  ['Used to initialize the model parameters.']),
 ('regularizer',
  '``RegularizerApplicator``, optional (default=``None``)',
  ['If provided, will be used to calculate the regularization penalty during training.'])]

In [21]: inspect_params
Out[21]: FullArgSpec(args=['self', 'vocab', 'text_field_embedder', 'encoder', 'label_namespace', 'constraint_type', 'include_start_end_transitions', 'dropout', 'initializer', 'regularizer'], varargs=None, varkw=None, defaults=('labels', None, True, None, <allennlp.nn.initializers.InitializerApplicator object at 0x10b6946d8>, None), kwonlyargs=[], kwonlydefaults=None, annotations={'return': None, 'vocab': <class 'allennlp.data.vocabulary.Vocabulary'>, 'text_field_embedder': <class 'allennlp.modules.text_field_embedders.text_field_embedder.TextFieldEmbedder'>, 'encoder': <class 'allennlp.modules.seq2seq_encoders.seq2seq_encoder.Seq2SeqEncoder'>, 'label_namespace': <class 'str'>, 'constraint_type': <class 'str'>, 'include_start_end_transitions': <class 'bool'>, 'dropout': <class 'float'>, 'initializer': <class 'allennlp.nn.initializers.InitializerApplicator'>, 'regularizer': typing.Union[allennlp.nn.regularizers.regularizer_applicator.RegularizerApplicator, NoneType]})
epwalsh commented 6 years ago

Related to this, is there any reason why a docstring linter such as pydocstyle shouldn't be added to the tests? I don't think pydocstyle would catch these types of errors, in particular, but it would catch a lot of other stuff.

matt-gardner commented 6 years ago

There's a pylint extension that will do this check. We investigated it a while ago, and we ended up dropping it, because it was a bit too aggressive in declaring failure (without obvious error messages), and because it didn't solve the problem of mismatch between the constructor and from_params (which isn't a problem anymore). I think the consensus is that we'd really like a check that, when we have parameters listed in a docstring, they need to match the arguments to the function, we just haven't found one that works for us. If you know of one, suggestions are welcome.

matt-gardner commented 5 years ago

There are some more ideas here, if someone wants to try to make this actually work.

DeNeutoy commented 4 years ago

Closing, as we no longer use numpy style docstrings

matt-gardner commented 4 years ago

Re-opening, as this would be good to have, and there might be a linter that makes sense now that we've switched our docstring style.