datacite / bolognese

Ruby gem and command-line utility for conversion of DOI metadata
MIT License
40 stars 14 forks source link

Singular geoLocationPolygons are wrapped in additional array #110

Closed richardhallett closed 3 years ago

richardhallett commented 3 years ago

This appears is generating a nested list now even when there is only one set of geolocationpolygns.

Example new DOI 10.25574/17213 - Nested Example old DOI: 10.25574/13609 - Not tested, singluar list.

Front logo Front conversations

richardhallett commented 3 years ago

@mfenner This bug has come about due to the fix that allowed multiple geoLocationPolygon elements to be parsed. While the bug fix is techncially correct for the schema, it's now introduced a backwards incompatability in the JSON representation and therefore the API.

As a refresher the XML specification allows multiple elements to appear under a single 1-N

<geoLocation>
      <geoLocationPolygon>
        <polygonPoint>
          <pointLongitude>41</pointLongitude>
          <pointLatitude>71</pointLatitude>
        </polygonPoint>
        <polygonPoint>
          <pointLongitude>45</pointLongitude>
          <pointLatitude>75</pointLatitude>
        </polygonPoint>
        <polygonPoint>
          <pointLongitude>55</pointLongitude>
          <pointLatitude>85</pointLatitude>
        </polygonPoint>
        <polygonPoint>
          <pointLongitude>41</pointLongitude>
          <pointLatitude>71</pointLatitude>
        </polygonPoint>
      </geoLocationPolygon>
      <geoLocationPolygon>
        <polygonPoint>
          <pointLongitude>65</pointLongitude>
          <pointLatitude>80</pointLatitude>
        </polygonPoint>
        <polygonPoint>
          <pointLongitude>55</pointLongitude>
          <pointLatitude>75</pointLatitude>
        </polygonPoint>
        <polygonPoint>
          <pointLongitude>45</pointLongitude>
          <pointLatitude>73</pointLatitude>
        </polygonPoint>
        <polygonPoint>
          <pointLongitude>65</pointLongitude>
          <pointLatitude>80</pointLatitude>
        </polygonPoint>
      </geoLocationPolygon>
    </geoLocation>

Which would produce

[{    
       "geoLocationPolygon"=> [
        [ {"polygonPoint"=>{"pointLatitude"=>"71", "pointLongitude"=>"41"}},
          {"polygonPoint"=>{"pointLatitude"=>"75", "pointLongitude"=>"45"}},
          {"polygonPoint"=>{"pointLatitude"=>"85", "pointLongitude"=>"55"}},
          {"polygonPoint"=>{"pointLatitude"=>"71", "pointLongitude"=>"41"}}],
        [
          {"polygonPoint"=>{"pointLatitude"=>"80", "pointLongitude"=>"65"}},
          {"polygonPoint"=>{"pointLatitude"=>"75", "pointLongitude"=>"55"}},
          {"polygonPoint"=>{"pointLatitude"=>"73", "pointLongitude"=>"45"}},
          {"polygonPoint"=>{"pointLatitude"=>"80", "pointLongitude"=>"65"}}
        ]
      ]
}]

This is a nested structure and follows the XML correctly.

However if you have a singular geoLocationPolygon within an element, it also nests it. e.g.

<geoLocation>
      <geoLocationPolygon>
        <polygonPoint>
          <pointLongitude>41</pointLongitude>
          <pointLatitude>71</pointLatitude>
        </polygonPoint>
        <polygonPoint>
          <pointLongitude>45</pointLongitude>
          <pointLatitude>75</pointLatitude>
        </polygonPoint>
        <polygonPoint>
          <pointLongitude>55</pointLongitude>
          <pointLatitude>85</pointLatitude>
        </polygonPoint>
        <polygonPoint>
          <pointLongitude>41</pointLongitude>
          <pointLatitude>71</pointLatitude>
        </polygonPoint>
      </geoLocationPolygon>
    </geoLocation>

produces

[{ 
     "geoLocationPolygon"=> [
          [
          {"polygonPoint"=>{"pointLatitude"=>"71", "pointLongitude"=>"41"}},
          {"polygonPoint"=>{"pointLatitude"=>"75", "pointLongitude"=>"45"}},
          {"polygonPoint"=>{"pointLatitude"=>"85", "pointLongitude"=>"55"}},
          {"polygonPoint"=>{"pointLatitude"=>"71", "pointLongitude"=>"41"}}
          ]
      ]

}]

But before it produced

[{ 
     "geoLocationPolygon"=> [
          {"polygonPoint"=>{"pointLatitude"=>"71", "pointLongitude"=>"41"}},
          {"polygonPoint"=>{"pointLatitude"=>"75", "pointLongitude"=>"45"}},
          {"polygonPoint"=>{"pointLatitude"=>"85", "pointLongitude"=>"55"}},
          {"polygonPoint"=>{"pointLatitude"=>"71", "pointLongitude"=>"41"}}
      ]

}]

So the question is what behaviour do we want to have? Should re reduce it to a non nested for backwards compatability? or just always say this is the case now with it being an array of arrays, however old dois will have a different format.

mfenner commented 3 years ago

This is a common issue with arrays with one element in XML and how to represent them in JSON. I have seen variations of this in a few other places in our metadata, and I would address it to look like our previous implementation.

mfenner commented 3 years ago

If that is not possible without breaking changes, put it in API version 3.

mfenner commented 3 years ago

Also, check the DataCite JSON for other examples where an element can be 0-N, e.g. fundingReference or relatedIdentifier (or 1-N for creator or title).

richardhallett commented 3 years ago

I looked at fundingReference and creators and they always return an array regardless if there is only one element, which to me makes sense. The problem is the fact I think geoLocationPolygons should have done it this way but it didn't historically.

What I will look at doing is seeing if we can keep the old way for when there is one element and push it to an array when there is more than one, this will keep backwards compatability and still allow multiple if desired.

Ideally this should be cleaned up but I think it will be a reasonable compromise.

mfenner commented 3 years ago

Sounds good. And please add to a list of "API version 3 issues", things we can address properly in a new API version.