ERDDAP / erddap

ERDDAP is a scientific data server that gives users a simple, consistent way to download subsets of gridded and tabular scientific datasets in common file formats and make graphs and maps. ERDDAP is a Free and Open Source (Apache and Apache-like) Java Servlet from NOAA NMFS SWFSC Environmental Research Division (ERD).
Creative Commons Zero v1.0 Universal
84 stars 58 forks source link

Support localized metadata on ERDDAP HTML pages #114

Open turnbullerin opened 1 year ago

turnbullerin commented 1 year ago

Hello fellow ERDDAP folks!

Recently, I've been championing an initiative over with the CF folks on getting a standard for localized metadata into CF (see https://github.com/cf-convention/discuss/issues/244) which has been paired with a discussion on expanding attribute and variable names to allow for a full (or greatly expanded) Unicode character set (https://github.com/cf-convention/cf-conventions/issues/237). Of note, the latter discussion is simply opening the CF conventions to use what NetCDF already allows, as NetCDF files are allowed to have attributes with any Unicode characters already.

As I also work on our ERDDAP server a lot, I wanted to draw your attention to these discussions because I noted that the current ERDDAP configuration only allows attribute names containing [A-Za-z0-9_] characters (I get a RuntimeException: [variable] isn't variableNameSafe when I put square brackets in for example). I recognize ERDDAP doesn't only work with NetCDF files and so there may be other restrictions than what NetCDF/CF will allow, but with the CF conventions moving towards allowing a full Unicode set (and ERDDAP's metadata is based on what CF/ACDD define) I thought it would be worth having a discussion on expanding the character set allowed in attribute and variable names and that some of you folks might want to weigh in on the CF discussion before it is finalized.

Part of why I have been championing that work is that I would love to see ERDDAP able to take localized metadata from a dataset and integrate it into the translation mechanism. Right now, there isn't a way to display a French title for a dataset when browsing the website in French (something that Canadian laws require for us to be able to use ERDDAP at the federal government level). I've made a hacky solution in Javascript that got me past the requirement, but having a proper internationalization solution for datasets in ERDDAP would be highly useful for me and probably others. I see the CF work as setting the foundation for this by defining a standard for encoding the different titles and such into the files themselves and I hope ERDDAP will pick that up in a future release (and would be happy to contribute myself to it).

rmendels commented 3 months ago

@turnbullerin @ChrisJohnNOAA It would be great if you could contribute code. We are understaffed and underfunded, and as I said the best way to get things like this into ERDDAP is to contribute code, or at least work with Chris on modifications.

If you go to https://github.com/ERDDAP/erddap you will see instructions on setting up a development environment using either Jetty or Docker running tomcat. And the tests are now Junit based, so one command runs the tests (also given on that page).

I am curious are there other major data servers that allow metadata in multiple languages (I know TDS does not at present, does Geoserver?)

turnbullerin commented 3 months ago

@rmendels Geoserver will allow it with a plugin - the ISO 19115-3 metadata format allows for multilingual metadata (as does HNAP 19139) so anything that delivers something based on those should be able to support it.

turnbullerin commented 3 months ago

Sorry, not even a plugin it's just core functionality: https://docs.geoserver.org/stable/en/user/configuration/internationalization/index.html

turnbullerin commented 2 months ago

@rmendels just to update you, the CF folks would also like to see it working with ERDDAP before we approve a change, so I'm taking it on myself to build a working proof of concept for dataset titles at least. Will update you when I have something working. From my initial investigation, it looks feasible to implement.

rmendels commented 2 months ago

@turnbullerin @ChrisJohnNOAA Great! Work with Chris on this. Once done, would like to set up a test ERDDAP to make certain that the main clients we know about all work. We really appreciate contributions.

turnbullerin commented 2 months ago

Hi Folks!

Here is a working prototype based on the current CF proposal for the title attribute only: https://github.com/turnbullerin/erddap/tree/localized_erddap

I implemented it as a new method on EDD localizedTitle(int language) that will provide a LocalizedString object - this has an htmlTag(int language) method that can be used to produce an HTML span tag with an appropriate lang attribute given the current user language (optionally you can change the span tag to another tag, or you can provide arguments to be used with noLongLines() on the content).

I updated two places in the ERDDAP code to use the new localized title - the first on the HTML tables, the second on the HTML header for pages like the dataset info page.

To test this, I added the following configuration to a dataset:

        <att name="title">MSC50 GrowFINE (Northeast Pacific)</att>
        <att name="title_fr">MSC50 GrowFine (nord-est de le Pacifique)</att>
                <att name="localizations">default: en-CA _fr: fr-CA</att>

You can then see the change - in French the title appears as

<span lang="fr-CA">MSC50 GrowFine (nord-est de le Pacifique)</span>

and in all other languages as

<span lang="en-CA">MSC50 GrowFINE (Northeast Pacific)</span>

Where the localizations attribute is not present, the assumption currently made is that the language is not known (for discussion - we could assume it is English?) and the lang tag will represent this properly - this can also be fixed just by adding the default language only to the localizations attribute.

<span lang="">Global Temperature and Salinity Profile Programme (GTSPP) Data, 1985-present</span>

Feedback welcome on implementation but if this approach seems feasible I'd like to see the CF proposal move forward first and then I will formalize a full patch for ERDDAP for all localizable text in all the locations I can find on HTML pages.

turnbullerin commented 2 months ago

Of note, I see a few other places where we might need to make some updates to support WCAG AA standards for language support, but I'd rather focus here on the metadata localization and we can fix other elements in other issues (e.g. the English column headers should be translated or should have the lang="en" tag if this is really not feasible to translate)

ChrisJohnNOAA commented 2 months ago

I took a read through and I think this looks like a great start.