NCEAS / eml

Ecological Metadata Language (EML)
https://eml.ecoinformatics.org/
GNU General Public License v2.0
40 stars 15 forks source link

add new images and schema-level docs for all eml modules #316

Closed mbjones closed 5 years ago

mobb commented 5 years ago

I did a bit of testing, to make use of the oxygen editor's documentation generation tool. The generation tool exports images and embedded documentation, for browsing.

Limitations: Oxygen only exports content of the xs:documentation element, EML uses xs:appinfo with docBook xml inside. So the appinfo content has to be moved to a documentation element, and it will only show plain text there. However,. the HTML rendering will respect newlines, etc, so we can use this to do a bit of formatting. Checked in stylesheet with these actions: a) copies an xsd file b) moves xs:appinfo content to a xs:documentation node, separated into functional paragraphs. c) uses each elements local-name as the label for that section.

process: xsltproc eml_appinfo2documentation.xsl ../xsd/eml.xsd > ../tmp/eml.xsd in OxygenXML editor: Tools > Generate Documentation > XML Schema Documentation

see attached screenshot eml_docs_o2_test

mobb commented 5 years ago

things to resolve and finish (if we decide to use Oxygen-generated documentation)

a) might want to copy over to a <xs:documentation> element, (and preserve xs:appinfo). Plus: You would see the original xs:appinfo as nice XML in the block titled "Source". As is, the text content of xs:documentation is dumped to the screen. Minus: you'd see the xs:appinfo content 3 times (converted at top, plus raw and original in source)

b) confirm this works for cdata sections. (cannot id a cdata section with xpath, but I think copy tranlates the examples within.)

c) script to run through all xsd files and dump in a temp folder (or some other)

mobb commented 5 years ago

tasks

mobb commented 5 years ago

A few notes on element-level docs using Oxygen, from the cmd line. first, it works. Here is the output from the schemas (dec 1) http://sbc.lternet.edu/external/InformationManagement/EML_220schema/docs/

I scripted the transform that moves the appinfo to xs:documentation (see bin/). Then ran the command line above (separate manual step). There are a few options we could employ

Re licensing:

mobb commented 5 years ago

Checked in:

mobb commented 5 years ago

Apparently I have a detached HEAD (eew) and cannot push. will need some advice.

mbjones commented 5 years ago

The files look great, @mobb. I'm glad we're going this direction.

A detached head means that, at some point, you checked out a specific commit, which moves the HEAD to that commit SHA. There are several ways for that to happen, as described on SO. Any commits done during that state will be lost when you checkout another branch.

If you haven't committed, then fixing this is as easy as git checkout BRANCH_EML_2_2 which will return you to that branch, and then you can commit and push. If you have already committed, then you'll likely need to roll back your commit using a soft reset (git reset --soft) to the commit that was checked out. I would figure out which that is using git log --graph. Once your changes are uncommitted, then you can switch to the 2.2 branch as before. The exact command you use will for the reset depends on the specific state of your repository. Be careful with git reset, as it can be used to completely throw away changes that haven't been pushed (although often there are some under the hood ways to even recover those, it requires some serious git-fu). I am still in Geneva so can't chat in real time, but I'll bet @csjx can help get you back to a normal state if needed.

csjx commented 5 years ago

Yes, let me know if you need hand on the git stuff @mobb. And yes, the Oxygen-based images look great. Thanks for working on that!

mobb commented 5 years ago

pushed the prep script and the transformed files up (in tmp dir). I did not check in the Oxygen output (html and img) mainly because the temp/docs dir is large (170 mb), and if the plan is to generate them with a build, there is no need to store output in the repo.

So I think what we really need next is the build target (and test if script can run without a license) and I'd like one of you java-folk to set up that up. @csjx ?

mobb commented 5 years ago

the script is done (https://github.com/NCEAS/eml/commit/7c60bc38f167223dbe93b2e0c259c95d3c0e8200) and generates ~40 files and 700 images The output is not checked in yet because one file is bigger than the git-limit of 100 MB. it's the schHierarchy.html -- basically the ToC One option is git-large-file-storage, but we'd have to enable it for the repo, and contributors need the plugin too. so we may want to talk first. Another would be some other splitting of the HTML, maybe manual. But we should see how it fits with the other documentation first.

amoeba commented 5 years ago

@mobb, @stevenchong, and I chatted this morning about this and I offered to take a look. I think this huge 133mb-ish file isn't necessary, it's just a huge hierarchy and it takes forever to load on my machine anyway so I doubt we wanna serve it anywhere. With this file removed, the rest of the docs work and make sense.

I want to bring up something that I could use some feedback on. With the new approach based on Oxygen, the schema diagrams aren't quite as complete. That is, for a given schema diagram, Oxygen doesn't include quite as many of the child elements as old ones so the individual images are less useful. I re-ran the images and am using GitHub's diff view to show the before and after. Make sure to click "Show rich diffs" to see the diffs:

screen shot 2019-01-23 at 4 16 20 pm

The diff -> https://github.com/amoeba/eml/commit/a03651d79b22e9ab55e7e62b7d249bebe3f888d4?diff=split&short_path=4aacebb#diff-4aacebb6469314af80f19eb79771334b

Are we okay with those differences? I and others find the current diagrams really useful but if they were less "full" they'd be less useful.

While the above is a bit of an issue, Oxygen does a really nice, interactive site which I think is really nice. See a live version here. Maybe this outweighs the changes to the diagrams?

Last thing: Do we wanna ditch the current HTML docs (generated with ant build) in favor of this new site, or should we integrate the new images into the ant build? The main difference I see is that the Oxygen docs don't have all the prose we wrote into the schema diagrams. i.e., we don't get a really nice intro like https://knb.ecoinformatics.org/external//emlparser/docs/eml-2.1.1/index.html. But maybe this is moot because we're switching to Bookdown?

csjx commented 5 years ago

My understanding is that is moot in that the Bookdown version of the docs will provide all of the summary prose, whereas the Oxygen-generated documentation will provide the technical details. Integrating the two will require some finesse.

mobb commented 5 years ago

Re @amoeba 's last comment:

we don't get a really nice intro like https://knb.ecoinformatics.org/external//emlparser/docs/eml-2.1.1/index.html.

It's a two step process. O2 only will process xs:appinfo to html, and we are using doc:documentation. The script I wrote has 2 steps, the first one does the transform that moves appinfo to doc. second step is to run the O2 generator. see https://github.com/NCEAS/eml/blob/BRANCH_EML_2_2/bin/build_schema_documentation.sh Sorry that was not clear (where do we keep documentation-documentation?)

I also left in the script that does only the transform: https://github.com/NCEAS/eml/blob/BRANCH_EML_2_2/bin/prep_documentation.sh

what were your changes to the O2 command line?

mbjones commented 5 years ago

Yeah, the bookdown is intended to replace the prose, and the oxygen is meant to provide the schema-level docs. In looking at the image diffs, it seems to me that the main difference is that the oxygen images just don't expand some of the child nodes that we did manually when we generated the original images. However, if we create the interactive, expandable tree via oxygen in HTML, then we may not need the images as static links at all. I'm imagining the book would simply include a chapter that inline the schema docs that come out of oxygen so they can be browsed and searched.

amoeba commented 5 years ago

Ah, @mobb I read that stuff but it didn't make sense as I didn't know anything about appinfo vs. documentation and the script doesn't fully run for me because of a bug I'll fix shortly.

"I'm imagining the book would simply include a chapter that inline the schema docs..."

Are you thinking via an iframe or something?

amoeba commented 5 years ago

Just made some tweaks to your script, @mobb. Can you take a look, re-run it, and, if it made the fixes you wanted, commit it?

amoeba commented 5 years ago

After talking with the EML team and also the data team, we decided a good route to go down would be to:

  1. Keep the nice module-level diagrams from earlier versions of EML (in the img folder) in a similar form as the images are useful in their own right. We use(d) XMLSpy to generate them before and Oxygen's diagrams aren't quite as complete in terms of how much of the sub-tree they display as XMLSpy's. Part of the reason to keep the module level docs is because a few folks have expressed interest in them because they're just a single image that you can glance at quickly and it's all on your screen at the same time.
  2. Also use Oxygen to create the really useful, stand-alone documentation site because we found we liked this as an additional EML documentation resource
  3. Use the images from (1) in the Bookdown documentation.

This type of approach would match a lot of software projects where there's a end-user focused guide (Bookdown) + a developer focused API doc sites (Oxygen). I think this is all in line with the spirit of the discussions in this Issue and outside of it.

New images were added in https://github.com/NCEAS/eml/commit/04cf0d5ca692e1681281792af1ea1edd27b6ad53. Check the rich diffs to see the side-by-side comparison. I took some liberty in adding a few extra images (and we can generate any more we need). I also optimized the PNGs and saved about 30% on each file.

mbjones commented 5 years ago

RIght now we have the module level docs in the bookdown in an iframe. Let's discuss whether we also want to embed the independent images, and if so, where they would go. But for now I don't think we should hold up the EML 2.2.0 release for this ticket.

amoeba commented 5 years ago

We talked in Slack just now about this and decided I'd make some minor tweaks to the module-level images, include them in the docs as static images, and link elsewhere to the Oxygen docs site. Working on that now.

mbjones commented 5 years ago

THanks, @amoeba -- your image links look good, and this is now incorporated in the build.