citation-style-language / documentation

Citation Style Language documentation
http://citationstyles.org/
Creative Commons Attribution Share Alike 4.0 International
42 stars 21 forks source link

Add spec document(s) to main repo #80

Closed bdarcus closed 4 years ago

bdarcus commented 4 years ago

Having schemas and the spec in different repos makes it tough to coordinate development and ensure consistency.

Ideally, for example, if someone submits a PR that involves some changes to the schema, it also includes changes to the spec, which can be reviewed together.

Here are how two well-known examples handle this:

  1. DocBook
  2. TEI

Both have a "spec" or "doc" subdir on a main repo, and DocBook also has a "schemas" subdir.

It could be we simply merge the two repos, delete "csl-evolution", and call it simply "csl"? So all issues and PRs would be managed through that one place.

Only thing I'm unsure of is the publishing angle. Would there be some problem publishing from the main repo? If yes, maybe just move the spec doc(s) over, and include the main repo as a submodule??

I'd be happy if we started with the above, but there's also value in merging the test-suite, for the same reason. As I've been working on cleaning up the test-suite repo, for example, a number of the issues and PRs there are partially or fully about the spec.

Thoughts?

bwiernik commented 4 years ago

FYI, GitHub automatically redirects for renamed repos, so if there are downstream sites relying on /documentation, /schema, or /test-suite, that would probably be the one to retain.

bdarcus commented 4 years ago

Which is "that"?

bwiernik commented 4 years ago

so if there are downstream sites relying on /documentation, /schema, or /test-suite

If one of the three repos has more downstream dependencies, that repo might want to be retained/renamed as the main repository. No idea if that is the case/which.

bdarcus commented 4 years ago

Ah, OK; got it.

My suggestion would involve restructuring the merged repo anyway, so maybe doesn't matter too much?

My impulse is to merge to "schema."

bwiernik commented 4 years ago

agreed

bdarcus commented 4 years ago

~And we could do all this as part of transition to v1.1, so this sort of change comes in one clean break?~

Maybe not; see comment just below. Could be staggered, though.

bdarcus commented 4 years ago

One problem with the idea is history; git, issues, etc.

Well, could transfer issues.

Git history mostly may not matter, but on the spec and tests, we could move before we tag 1.0.2, so we'd have integrated history from 1.0.2 forward.

denismaier commented 4 years ago

A good idea, yes. I don't know how this works in practical terms and so, but the current state of having issues scattered over different repos is not particularly helpful.

bdarcus commented 4 years ago

I'm thinking of a structure something like this for the directory:

├── schemas
│   ├── data
│   └── style
├── spec
├── tests
│   ├── processor
│   └── schemas
└── tools
denismaier commented 4 years ago

Another advvantage of having everything in one repo is that we can better use milestones and so for release planning.

denismaier commented 4 years ago

By the way, what do you think about literate programming ideas? Seems possible with Relax NG and we could maintain specs and schema in the same document.

Addendum: Here you suggest we remove those "a:defaultValue" patterns from csl.rnc, which have no purpose other than documentation. I guess literate programming could be used to remove these comments and still have everything properly documented.

There seem to be existings ways and workflows to do this: http://books.xmlschemata.org/relaxng/relax-CHP-14-SECT-2.html

bdarcus commented 4 years ago

By the way, what do you think about literate programming ideas? Seems possible with Relax NG and we could maintain specs and schema in the same document.

I'm a fan in general. Whether to do that would depend on how practical it was in terms of tools, etc.

This is simultaneously a tool that does literature programming with RNC, and a demonstration of it.

denismaier commented 4 years ago

Oh that looks interesting. The link to the tutorial is broken though...

bdarcus commented 4 years ago

The problem is the details.

Editing RNG XML is a hassle compared to the Compact syntax.

Editing XML or HTML for the documentation is also kind of a hassle, compared to plain text alternatives.

There are plain text solutions with support for literate programming, like org-mode; but it's pretty much specific to emacs, and I don't think babel has support for rnc.

XDF is a hybrid XML format with embedded RNC. But not sure complete or widely-used it is.

denismaier commented 4 years ago

True.

Perhaps pandoc with a filter could be an option. Need to look into that.

bdarcus commented 4 years ago

True.

Perhaps pandoc with a filter could be an option. Need to look into that.

That might indeed be a solution.

Pandoc does have support for rnc syntax highlighting on code blocks.

If you do play with this, maybe you could put together a simple demo repo, with two or three simple files, with embedded RNC?

One key question would how to publish easily. And, of course, schema validation gets more complicated.

If we could figure this out, you could imagine spec and schema merged something like so, where the csl.md file would be the html root index page:

├── csl-bibliography.md
├── csl-choose.md
├── csl-citation.md
├── csl-dates.md
├── csl-locale.md
├── csl.md
├── csl-metadata.md
├── csl-names.md
└── csl-style.md
└── csl-terms.md
└── csl-variables.md
denismaier commented 4 years ago

Ok, now a very simple first example: https://github.com/denismaier/pandocliterateschema

In the md include

::: rnc
RNC code
:::

make specs removes those sections, keeping only the text, make schema extracts the divs tagged with rnc.

As you say, validation is more difficult. (You could of course include a trang run in the make rules.) Also, schema and specs must be in the same order. Ordering things in different ways might be possible somehow, but that's beyond me at the moment.

At least, including multiple files is not a problem. You can just pass a list of files to pandoc.

denismaier commented 4 years ago

Concerning publishing: What about about structure like this:

├── source
├── spec
├── schema

All the literate sources are in the source folder. The processed schema and specs documents get moved (automatically) to the appropriate folder. Commits to these folders can trigger some action. ???

bdarcus commented 4 years ago

Note the current make schema output is not valid.

Question: why remove the schema fragments from the spec output?

I guess the more I think about this, the spec should be aimed at developers, and the primer can be style authors and such.

In any case, minor issues; thanks for putting this together.

I wonder if others have thoughts?

bdarcus commented 4 years ago

My question on publishing is related to the point made by @rmzelle on readthedocs.

If I understand now, the latter just points to the documentation repo, but I don't really understand how updating of the published docs work.

Whatever the case, we want that automatic as possible, so we only really have to focus on editing the content, and tagging releases

denismaier commented 4 years ago

Note the current make schema output is not valid.

Of course not. I just wanted to test if it's possible to extract certain divs.

Question: why remove the schema fragments from the spec output?

Not necessary. I just wanted to replicate the current style where the specs is a descriptive text. You can of course keep both things together.

bdarcus commented 4 years ago

To update where I think we arrived:

  1. we agree at least the spec should be merged to the schema repo; TBD is when to do this, and how to update readthedocs integration for this to continue to work seamlessly. This would maintain status quo, but would allow issues and PR to merge changes related to schemas and spec.
  2. @denismaier has suggested going a step further; fully integrating spec and schema in a literate programming environment based on pandoc (see comment). In this model, the rnc schema is maintained within the markdown documentation files, and the schema extracted via a make schema command. I think if we did do this the obvious time to do it is with the move to v1.1. Still TBD on this approach is how to do publishing (including how we'd deal with CSS styling, etc).

Either way, like I say just above:

Whatever the case, we want that automatic as possible, so we only really have to focus on editing the content, and tagging releases

As in, if we tag a v1.0.2 release, I think ideally the docs would get automatically published with the correct version tag, without us having to modify code.

I don't have a strong opinion one way or the other ATM. But please weigh in with your thoughts.

bdarcus commented 4 years ago

BTW, I've been active with the new-ish emacs package org-roam.

https://github.com/org-roam/org-roam

Initially they used readthedocs, but have recently switched to an integrated solution, using github pages.

The doc content itself is maintained in a single org file:

https://github.com/org-roam/org-roam/blob/master/doc/org-roam.org

Here's the workflow file they use for publishing.

https://github.com/org-roam/org-roam/blob/master/.github/workflows/docs.yml

... and they also have:

https://github.com/org-roam/org-roam/blob/master/.github/workflows/docs.yml

bdarcus commented 4 years ago

Also, I experimented with splitting the spec into separate markdown files. WIP (still needs work; some is confusing, and some files are still too long) results here.

https://github.com/bdarcus/documentation/tree/literate/split-md

bdarcus commented 4 years ago

I also started to take some notes on the test repo issue tracker.

As I concluded from that exercise, there are three distinct issues here:

  1. split schema and spec (status quote) vs merged (literate)
  2. spec format (rst vs md)
  3. publishing platform and workflow (readthedocs vs gh-pages)

I note the CommonMark project, which is pretty close to our case, takes a pretty interesting approach, where they prerender the content using pandoc in a separate publishing repo, which just has a single branch: gh-pages.

So they publish the spec on github.

Also interesting, they embed the test suite within the spec.txt document.

Finally, presumably to deal with whitespace and other issues we've dealt with, the main spec repo has an editor config file.

rmzelle commented 4 years ago

My question on publishing is related to the point made by @rmzelle on readthedocs. If I understand now, the latter just points to the documentation repo, but I don't really understand how updating of the published docs work.

Read the Docs auto-builds the contents of https://docs.citationstyles.org/ (custom domain for https://citation-style-language.readthedocs.io/) upon every commit to our "documentation" repo.

(also, and this is a bit off-topic, but I was never very satisfied with how we kept track of the CSL release notes, with a restart with each new version (e.g. the CSL 1.0 documentation has release notes for 1.0 at https://docs.citationstyles.org/en/1.0/release-notes.html, and the 1.0.1 documentation only has release notes for 1.0.1 at https://docs.citationstyles.org/en/1.0.1/release-notes.html. It would probably be clearer if we just kept adding to a changelog with each release).

bdarcus commented 4 years ago

Read the Docs auto-builds the contents of https://docs.citationstyles.org/ (custom domain for https://citation-style-language.readthedocs.io/) upon every commit to our "documentation" repo.

How do you deal with versioning of the spec, and would you deal with it when we do 1.1?

(also, and this is a bit off-topic, but I was never very satisfied with how we kept track of the CSL release notes, with a restart with each new version (e.g. the CSL 1.0 documentation has release notes for 1.0 at https://docs.citationstyles.org/en/1.0/release-notes.html, and the 1.0.1 documentation only has release notes for 1.0.1 at https://docs.citationstyles.org/en/1.0.1/release-notes.html. It would probably be clearer if we just kept adding to a changelog with each release).

So we really need a single changelog file in the repo?

bdarcus commented 4 years ago

Initially they used readthedocs, but have recently switched to an integrated solution, using github pages.

I checked with them, and the reason they swtiched was less about issues with readthedocs, and more about wanting to integrate the documentation in emacs too, and use org.

Maybe we should keep things as is on the publishing front, since it does work nicely.

I think open question remains whether we want to switch to markdown, and whether it would help to split the file.

On markdown, docs says there are limitations: "Markdown doesn’t support a lot of the features of Sphinx, like inline markup and directives."

rmzelle commented 4 years ago

On markdown, docs says there are limitations: "Markdown doesn’t support a lot of the features of Sphinx, like inline markup and directives."

Yes, the specification definitely uses some things that aren't part of Markdown (like a TOC).

How do you deal with versioning of the spec, and would you deal with it when we do 1.1?

See https://docs.readthedocs.io/en/stable/versions.html. We'd just need a 1.1 branch in the repo.

rmzelle commented 4 years ago

Git history mostly may not matter

Per https://github.com/citation-style-language/documentation/issues/80#issuecomment-636533041 above, the "documentation" repo has a bunch of branches for past CSL specification releases and release notes. Similarly, the CSL validator relies on the tags in https://github.com/citation-style-language/schema/ to allow users to select different versions of the CSL schema.

I understand there can be benefits of e.g. combining the specification, schema, and test suite, but I'd like to keep the old releases of schema and documentation available.

bdarcus commented 4 years ago

There's maybe no ideal solution.

Other option is keep status quo, but ask people to submit spec language where appropriate on schema?

bdarcus commented 4 years ago

I'm going to close this for now. There's enough work to do without introducing this change, and the costs/benefits equation is unclear to me ATM.