ERDDAP / erddap

ERDDAP is a scientific data server that gives users a simple, consistent way to download subsets of gridded and tabular scientific datasets in common file formats and make graphs and maps. ERDDAP is a Free and Open Source (Apache and Apache-like) Java Servlet from NOAA NMFS SWFSC Environmental Research Division (ERD).
Creative Commons Zero v1.0 Universal
76 stars 54 forks source link

Using XInclude #112

Open benjwadams opened 9 months ago

benjwadams commented 9 months ago

Hi, I attempted to use XInclude to reduce the amount of repetitive boilerplate XML for very similar datasets in datasets.xml.

In the document root, I added an XML namespace for XInclude: <erddapDatasets xlmns:xi="http://www.w3.org/2001/XInclude">

Later on, in between the </addAttributes> and </dataset> closing tag, I attempted to run the XInclude snippet:

<xi:include href="hfradar_dataset_variables.xml"/>

Attempting to reload the datasets results in no datasets properly loading, and this being reported in log.txt:

ERROR while processing line #164 datasets.xml: java.lang.RuntimeException: datasets.xml error on line #164: java.lang.RuntimeException: datasets.xml error on or before line #164: ERROR in XML file on line #164: Unexpected tag=<erddapDatasets><dataset><xi:include> content="".
 at gov.noaa.pfel.erddap.LoadDatasets.run(LoadDatasets.java:460)
Caused by: java.lang.RuntimeException: datasets.xml error on or before line #164: ERROR in XML file on line #164: Unexpected tag=<erddapDatasets><dataset><xi:include> content="".
 at gov.noaa.pfel.erddap.dataset.EDD.fromXml(EDD.java:486)
 at gov.noaa.pfel.erddap.LoadDatasets.run(LoadDatasets.java:364)
Caused by: java.lang.Exception: ERROR in XML file on line #164: Unexpected tag=<erddapDatasets><dataset><xi:include> content="".
 at gov.noaa.pfel.coastwatch.util.SimpleXMLReader.throwException(SimpleXMLReader.java:600)
 at gov.noaa.pfel.coastwatch.util.SimpleXMLReader.unexpectedTagException(SimpleXMLReader.java:589)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromFiles.fromXml(EDDTableFromFiles.java:347)
 at gov.noaa.pfel.erddap.dataset.EDD.fromXml(EDD.java:463)
 ... 1 more

I don't use XML a ton, but it looks to me that this should be valid XML. Please let me know if I'm doing something wrong that could be easily fixed. I have looked at source and it looks like ERDDAP's XML parser deviates from many other mainstream XML parsers. Can it support XInclude snippets like the above to reduce the amount of repeated boilerplate variables?

benjwadams commented 9 months ago

https://github.com/ERDDAP/erddap/blob/468e2b85d2c2484024f1418619f35bbe01b27a94/WEB-INF/classes/gov/noaa/pfel/coastwatch/util/SimpleXMLReader.java

Correct me if I'm wrong, but it looks like ERDDAP is rolling its own XML parser? Any reason for this versus using established Java libraries dedicated to this purpose?

BobSimons commented 9 months ago

You are correct, ERDDAP uses its own XML parser. That is partly why it doesn't support the tags you added. ERDDAP's XML parser is basically unchanged from when it was originally written, which was before many XML features (like the ones you used) were first defined.

When I wrote the XML parser (~2006), there were only two standard Java XML parsers (SAX and DOM?), both of which had massive limitations (either massive memory use or painful coding to use it). The approach of the ERDDAP parser solved these problems: it is fast, one pass, low memory usage, and easy to code with.

As with many things in ERDDAP, if one were starting from scratch now, one might do some things differently. There are libraries now that didn't exist then. Programmers do things differently now. Different things are in style. You could say: "Modernize ERDDAP!", but it would be a ton of work and it always seemed more important to add new features to ERDDAP than to expend great effort to follow the latest trends and rewrite lots of code. As the netcdf-java project for the last 6+ years has shown, you can have multiple programmers work to modernize a library and sometimes all you get is lots of new bugs and no new features, and changes that broke every piece of software that used that library.

I'll leave it to Chris to decide if it is a good idea and the best use of his time to switch XML parsers. I think it isn't, partly because it would be a lot of work and partly because I'm not convinced this feature you want to add would be widely used. More important, it doesn't let administrators accomplish anything that they can't already do by using other tools to generate the datasets.xml chunks for datasets in a way that includes the repetitive (for your datasets) information. Eugene Burger/Kevin O'Brien/PMEL have long had an external tool they created for their group's use to generate datasets.xml chunks given the answers to a small questionnaire (which platform, which sensors, what ID, what date deployed, etc). I think other groups have done similar things. I think make-a-new-external-tool is a better approach as it is a relatively easy project and can be customized to your group's needs. Programmer time for ERDDAP is severely limited these days and I think other features (e.g., MQTT and OGC ERD API support) seem like better uses of Chris' time as they are aligned with ERDDAP's goals and connect ERDDAP to other realms in ways that were not possible before.