NNPDF / reportengine

A framework for declarative data analysis
https://data.nnpdf.science/validphys-docs/guide.html
GNU General Public License v2.0
1 stars 2 forks source link

numpydoc style docstrings get reformatted when using --help #34

Open wilsonmr opened 4 years ago

wilsonmr commented 4 years ago

AFAICT this is happening in

https://github.com/NNPDF/reportengine/blob/491c92e0d601e7f28b216160700a26dfc654d8cb/src/reportengine/helputils.py#L28

Is there a reason we don't want to respect new lines? Surely we should have the freedom to add a single new line if we want?

wilsonmr commented 4 years ago

example:

shifted_kl: Sample mean of the shifted Kullbach-Leibler divergence
  between target and model distribution.

  Parameters ---------- model_log_density: torch.Tensor
      column of log (\tilde p) for a sample of states, which is
  returned by
      forward pass of `RealNVP` model target_log_density:
  torch.Tensor
      column of log(p) for a sample of states, which includes the
  negative
      action -S(\phi) and a possible contribution to the volume
  element due
      to a non-trivial parameterisation.

  Returns ------- out: torch.Tensor
      torch tensor with single element, corresponding to the
  estimation of the
      shifted K-L for set of sample states.
wilsonmr commented 4 years ago

a minimal solution could be to at least let the formatting options be controllable when subclassing App or something. I might try pushing something which does this.

Zaharid commented 4 years ago

This is going to be annoying: AFIACT numpydoc requires exactly one line break between the section tiltle and the first item. It doesn't work with two (one of these little sphinx annoyances). On the other hand, I don't want a line break to be significative when printing --help, because then it does weird things to the text flow.

wilsonmr commented 4 years ago

hmm if people write code and docstrings that display nicely in terminals (a lot of the code reviews in NNPDF/nnpdf ask for this) and if the --help <provider module> added a new line between the provider name and its docstring then any further formatting seems kind of unneccesary

Also I think it's preferable to have a few paragraphs that look like this

Sample mean of the shifted Kullbach-Leibler divergence
  between target
  and model distribution.

than having this:

Parameters ---------- dataset : DataSetSpec
    object parsed from the `dataset_input` runcard key fitthcovmat:
None or ThCovMatSpec
    None if either `use_thcovmat_if_present` is False or if no
theory
    covariance matrix was used in the corresponding fit t0set: None
or PDF
    None if `use_t0` is False or a PDF parsed from `t0pdfset` runcard
key perform_covmat_reg: bool
    whether or not to regularize the covariance matrix
norm_threshold: number
    threshold used to regularize covariance matrix
Zaharid commented 4 years ago
Sample mean of the shifted Kullbach-Leibler divergence
  between target
  and model distribution

This does look awful though.

Maybe one could try to use sphinx itself to format the terminal help. Maybe you can tell it to use the man page format or whatever. That has the big disadvantage of increasing a lot the tooling functionality.

Another possibility is to try to make the formatting function in reportengine smarter, but it would need some sort of not completely trivial lookahead parsing.

wilsonmr commented 4 years ago

It looks bad but is still readable the parameters example just doesn't make sense

Also I will point out that particular example happens because there is no new line between provider name and the docstring. The first line of the docstring is less than the width of a terminal window.

I think my suggestion of at least being able to control the new line stripping (maybe other kwargs of the sane_wrap) at the level of app is not completely crazy. In addition I think that when you do --help <provider module>

it printing

<action>:
docstring

would be better, then as long as there is discipline with docstrings they will look fine anyway..

Zaharid commented 4 years ago

How would the argument work? Would all of vp use the same thing? In which case it should probably be the default.

Zaharid commented 4 years ago

FWIW the thinking at the time was that we shouldn't care about single line breaks or indentation that happens because that depends on how you format docstrings docstrings. So we strip that away and reflow the string properly. Under that thinking you would break paragraphs with two line breaks.

wilsonmr commented 4 years ago

Yeah I mean the logic doesn't seem unreasonable. I just realised that most of the help with https://github.com/wilsonmr/anvil is unreadable because I made an effort to format docstrings in the same style as numpy docs because I like the look, I can change that but then I guess we have a few cases of the same thing in validphys

if I come up with something sensible I'll make a PR but I tried to change it earlier and it quickly got quite messy

Zaharid commented 4 years ago

I suppose a bit of regex magic can do a reasonable job at separating the various bits. E.g. something of the form r'\n\n.\-+\n' is a title.

Zaharid commented 4 years ago

In practice this line txt = re.sub(r'(.)\n([^\s])',r'\1 \2',txt) currently doesn't reflow lines that start with spaces. It could be expanded to leave things like colon separated lists. Probably the sections would have to be detected somewhere else.