esmero / strawberryfield

A Field of strawberries
GNU Lesser General Public License v3.0
10 stars 5 forks source link

XML to JSON Serialization #10

Closed DiegoPino closed 3 months ago

DiegoPino commented 5 years ago

Use Case

Import existing XML metadata (EAD, MODS, etc) into a native JSON format for strawberryfield. This can be handy when dealing with external sources of migrations where we want to maintain existing data/schemas but cast into a more general JSON format to allow our webform system (https://github.com/esmero/webform_strawberryfield) to handle further editing/creation.

Problem

Given a simple XML like

<?xml version='1.0' standalone='yes'?>
<archdesc localtype="inventory" level="subgrp">
<did>
<head>Overview of the Records</head>
<repository label="Repository:">
<corpname>
<part>Minnesota Historical Society</part>
</corpname>
</repository>
<origination label="Creator:">
<corpname>
<part>Minnesota. Game and Fish Department</part>
</corpname>
</origination>
<unittitle label="Title:">Game laws violation records,</unittitle>
<unitdate label="Dates:">1908-1928</unitdate>
<abstract label="Abstract:">Records of prosecutions for and seizures of property resulting from violation of the state's hunting and fishing laws.</abstract>
<physdesc label="Quantity:">2.25 cu. ft. (7 v. and 1 folder in 3 boxes)</physdesc>
<physloc label="Location:">See Detailed Description section for box location</physloc>
</did>
</archdesc>

A PHP snippet of code like

$xml = simplexml_load_string($ead);
$json = json_encode($xml);
$array = json_decode($json,TRUE);

Would easily deal with XML to JSON and, if needed, to Array casting.

But:

For XML elements with @attributesand text values, JSON serializer will discard them totally ending in an array like

[unittitle] => Game laws violation records,
[unitdate] => 1908-1928

Solution

Deal with JSON serialization in the same way JSON-LD does using the @value key for the actual text value and a custom @attributekey or even a @typekey with a mapping @context that helps bring non semantic, from an XML schema coming, elements into an local context.

This implies: 1.- Build a decorator class for the JSON Serialization 2.- Subclass Simple XML Element Class 3.- Build a Composer aware PHP Library we can include in Strawberryfield

Potential Code and Discussion

This is a great way of dealing with XML and integrating our own code. This would allow us to also accommodate files already processed by other systems (migrate) or even be fed by external APIs and then cast via Twig to visualizations, index in our Solr, etc.


/**
 * Class JsonLDSimpleXMLElementDecorator
 *
 * Implement JsonSerializable for SimpleXMLElement as a Decorator with JSON-LD syntax
 */
class JsonLDSimpleXMLElementDecorator implements JsonSerializable
{
    const DEF_DEPTH = 512;

    private $options = ['@attributes' => TRUE, '@text' => TRUE, 'depth' => self::DEF_DEPTH];

    /**
     * @var SimpleXMLElement
     */
    private $subject;

    public function __construct(SimpleXMLElement $element, $useAttributes = TRUE, $useValue = TRUE, $depth = self::DEF_DEPTH) {

        $this->subject = $element;

        if (!is_null($useAttributes)) {
            $this->useAttributes($useAttributes);
        }
        if (!is_null($useValue)) {
            $this->useValue($useValue);
        }
        if (!is_null($depth)) {
            $this->setDepth($depth);
        }
    }

    public function useAttributes($bool) {
        $this->options['@attributes'] = (bool)$bool;
    }

    public function useValue($bool) {
        $this->options['@value'] = (bool)$bool;
    }

    public function setDepth($depth) {
        $this->options['depth'] = (int)max(0, $depth);
    }

    /**
     * Specify data which should be serialized to JSON
     *
     * @return mixed data which can be serialized by json_encode.
     */
    public function jsonSerialize() {
        $subject = $this->subject;

        $array = array();

        // json encode attributes if any.
        if ($this->options['@attributes']) {
            if ($attributes = $subject->attributes()) {
                $array['@attributes'] = array_map('strval', iterator_to_array($attributes));
            }
        }

        // traverse into children if applicable
        $children      = $subject;
        $this->options = (array)$this->options;
        $depth         = $this->options['depth'] - 1;
        if ($depth <= 0) {
            $children = [];
        }

        // json encode child elements if any. group on duplicate names as an array.
        foreach ($children as $name => $element) {
            /* @var SimpleXMLElement $element */
            $decorator          = new self($element);
            $decorator->options = ['depth' => $depth] + $this->options;

            if (isset($array[$name])) {
                if (!is_array($array[$name])) {
                    $array[$name] = [$array[$name]];
                }
                $array[$name][] = $decorator;
            } else {
                $array[$name] = $decorator;
            }
        }

        // json encode non-whitespace element simplexml text values.
        $text = trim($subject);
        if (strlen($text)) {
            if ($array) {
                $this->options['@value'] && $array['@value'] = $text;
            } else {
                $array = $text;
            }
        }

        // return empty elements as NULL (self-closing or empty tags)
        if (!$array) {
            $array = NULL;
        }

        return $array;
    }

Use would be

$xml = new SimpleXMLElement($ead);
$xml = new JsonLDSimpleXMLElementDecorator($xml, TRUE, TRUE, 3);
echo json_encode($xml, JSON_PRETTY_PRINT), "\n";

This code is adapted (a few single lines change really) https://hakre.wordpress.com/2013/07/10/simplexml-and-json-encode-in-php-part-iii-and-end/ and its pretty cool!

Webform integration

This will require that form elements allow/read/write the @attribute element, which can be generalized by the use of the custom JSON properties each Webform element can/could have.

DiegoPino commented 5 years ago

Also see https://json-ld.org/spec/latest/json-ld/#dfn-json-objects

DiegoPino commented 3 months ago

Implemented