GenericMappingTools / pygmt

A Python interface for the Generic Mapping Tools.
https://www.pygmt.org
BSD 3-Clause "New" or "Revised" License
758 stars 219 forks source link

How to automate copying of GMT documentation to PyGMT #895

Open weiji14 opened 3 years ago

weiji14 commented 3 years ago

The time we spend on writing and reviewing documentation for PyGMT is getting insane. Specifically, I'm talking about the docstrings in modules (not the tutorials/gallery examples). These are some of the formatting fixes we apply:

  1. Turn short aliases to long aliases (#474)
  2. Check that italics/bold/code-block formatting is applied correctly (#631)
  3. Wrap to 79 characters (#384)

And there are perhaps more 'standards' to be applied soon (#884, #886, etc).

There should be an automated or semi-automated script to just copy things from the canonical GMT function docstring, apply the above formatting standards, and paste it to the PyGMT function. Say when we want to:

  1. wrap a new module
  2. keep up to date with a new GMT release

This would likely require some coordination with upstream GMT. I think there are smart people who might have some ideas around this.

Are you willing to help implement and maintain this feature? Yes, but need teamwork

liamtoney commented 3 years ago

We should be aware of where it doesn't quite make sense to take the GMT docs w/o appending extra info. For example, currently the docs for e.g. the region arg on master show:

Screen Shot 2021-02-14 at 11 24 20 PM

This is confusing since we really recommend for almost all use cases the form: [xmin, xmax, ymin, ymax] as a list, not the slash string syntax from GMT. But the script could account for these types of args!

maxrjones commented 3 years ago

Related to both this and #1042, I am interested in whether there is a way to use some NLP tricks to identify shared non-common options between different GMT modules. This would be helpful for ensuring that we aren't redundantly creating PyGMT documentation when helpers could be used and would generally make it easier to develop aliases in a consistent manner.

maxrjones commented 3 years ago

Even though it's a lot of work to write/review the docstrings, I prefer that for now because there are a lot of pending improvements to the GMT documentation (see https://github.com/GenericMappingTools/gmt/discussions/4678 for some examples). So, copying the GMT documentation as is may create more work later on by propagating some of the current limitations.

I think a compromise would be to automatically create a link to the relevant option in GMT for each PyGMT option. The easiest place to do this would be in fmt_docstring(). I think this would be more useful than the current format of linking to the GMT documentation using "Full option list at \", since the use of single-character options is undocumented (https://github.com/GenericMappingTools/pygmt/issues/1203#issuecomment-817419506) and will be eventually deprecated (#262). Each of the single letters in the alias list (example shown below) could link to the related option in the GMT documentation (e.g., https://docs.generic-mapping-tools.org/latest/basemap.html#l). Even after the use of single character options is deprecated, I would prefer that a table relating PyGMT options to their GMT counterparts remains in the docstrings for users with background in using GMT.

image

In addition, we could create more automated checks rather than automated copying. For example, checking that parameters are formatted as code which I think is one of the easier things to miss when writing the docstrings.

seisman commented 3 years ago

I think a compromise would be to automatically create a link to the relevant option in GMT for each PyGMT option. The easiest place to do this would be in fmt_docstring().

Sounds a good idea.

BTW, I feel that the long list of alias waste so much spaces on the right side, it may be better to use the .. hlist directive instead (see https://sphinx-rtd-theme.readthedocs.io/en/stable/demo/lists_tables.html#hlists for what it looks like)

weiji14 commented 3 years ago

Related to both this and #1042, I am interested in whether there is a way to use some NLP tricks to identify shared non-common options between different GMT modules. This would be helpful for ensuring that we aren't redundantly creating PyGMT documentation when helpers could be used and would generally make it easier to develop aliases in a consistent manner.

One of the ideas I had when I opened this issue (that requires a lot of work) was to have upstream GMT use standardized placeholders for parameters/arguments, e.g. using < -X > or { -X } (as with Jinja). My thinking was that this problem isn't restricted to just PyGMT, but also GMT.jl and potential wrappers in the future (e.g. GMT for R?). I do realize this is a near impossible task, but just would like to put this idea out here so that people don't keep reinventing the wheel for each GMT wrapper.

maxrjones commented 3 years ago

Related to both this and #1042, I am interested in whether there is a way to use some NLP tricks to identify shared non-common options between different GMT modules. This would be helpful for ensuring that we aren't redundantly creating PyGMT documentation when helpers could be used and would generally make it easier to develop aliases in a consistent manner.

One of the ideas I had when I opened this issue (that requires a lot of work) was to have upstream GMT use standardized placeholders for parameters/arguments, e.g. using < -X > or { -X } (as with Jinja). My thinking was that this problem isn't restricted to just PyGMT, but also GMT.jl and potential wrappers in the future (e.g. GMT for R?). I do realize this is a near impossible task, but just would like to put this idea out here so that people don't keep reinventing the wheel for each GMT wrapper.

I'm not sure that I completely understand how you're envisioning the standardized placeholders, if you don't mind explaining a bit more. In https://github.com/GenericMappingTools/gmt/pull/4915 I tried to make it so that all modules that use -F to put a box behind something include the same source file for the documentation (and am now noticing that I could have done this better). Is this similar to what you think would be useful for the wrappers?

weiji14 commented 3 years ago

I'm not sure that I completely understand how you're envisioning the standardized placeholders, if you don't mind explaining a bit more. In GenericMappingTools/gmt#4915 I tried to make it so that all modules that use -F to put a box behind something include the same source file for the documentation (and am now noticing that I could have done this better). Is this similar to what you think would be useful for the wrappers?

Sure. The idea is to make it easy to search and replace parameters (that should be bold) and arguments (that should be italics). To be honest, I don't have this well thought out, but am taking inspiration from templating engines like Jinja and Liquid

Original text

**-F**\ [**l**\|\ **t**][**+c**\ *clearances*][**+g**\ *fill*][**+i**\ [[*gap*/]\ *pen*]][**+p**\ [*pen*]]\
[**+r**\ [*radius*]][**+s**\ [[*dx*/*dy*/][*shade*]]]

Template text

{{ -F | param }}\ [{{ l }}\|\ {{ t }}][{{ +c }}\ { clearances } ][{{ +g }}\ { fill }][{{ +i }}\ [[{ gap }/]\ { pen }]][{{ +p }}\ [{ pen }]]\
[{{ +r }}\ [{ radius }]][{{ +s }}\ [[{ dx }/{ dy }/][{ shade }]]]

At it's simplest, we could replace curly brackets '{' and '}' with '*' (double means bold *, single means italics ). The param filter in {{ -F | param }} would indicate that F is a parameter that needs to be replaced with a long alias (box in this case).

seisman commented 3 years ago

I'm afraid any automatic documentation-copying mechanism means that we have to use the GMT syntax in PyGMT, e.g., +gred+p1p,blue. However, we may want more Pythonic ways (#1082). Anyway, I think this PR has the lowest priority due to its high difficulty.

maxrjones commented 3 years ago

Just a cross-reference that we are considering ways to automate some of the GMT documentation during the addition of long-options at https://github.com/GenericMappingTools/gmt/issues/5561, which could be useful for PyGMT in the future. At the least, it will probably be helpful to add somewhere in PyGMT's contributing/maintenance information where to find the unstable long-option names for core GMT (e.g. https://github.com/GenericMappingTools/gmt/blob/86b39ff1dcee9ffc1b232b8c21a21965dd38dca5/src/blockmean.c#L43-L53).