atviriduomenys / spinta

Spinta is a framework to describe, extract and publish data (a DEP Framework).
MIT License
10 stars 4 forks source link

Source of `XML` elements should include `text()` #632

Open karina-klinkeviciute opened 1 month ago

karina-klinkeviciute commented 1 month ago

DSA property sources should be written in a way that they could be used to retrieve data from the source. For XML documents, it would be an XPath expression.

For elements, if we want the text of the element, the XPath expression would be <name>/text() - i.e. a name of the element with an added /text().

When creating DSA from an XML source, element names are added as they are. Instead, /text() should be added to them.

If we have this XML structure:

<city>
    <name>
        Kaunas
    </name>
    <population>
        200000
    </population>
</city>

The resulting DSA now looks like this:

id | d | r | b | m | property     | type    | ref | source     | prepare | level | access | uri | title | description
   | dataset                      |         |     |            |         |       |        |     |       |
   |   | resource                 | xml     |     | data.xml   |         |       |        |     |       |
   |                              |         |     |            |         |       |        |     |       |
   |   |   |   | City             |         |     | /city      |         |       |        |     |       |
   |   |   |   |   | name         | string  |     | name       |         |       |        |     |       |
   |   |   |   |   | population   | integer |     | population |         |       |        |     |       |

And instead it should look like this:

id | d | r | b | m | property     | type    | ref | source            | prepare | level | access | uri | title | description
   | dataset                      |         |     |                   |         |       |        |     |       |
   |   | resource                 | xml     |     | data.xml          |         |       |        |     |       |
   |                              |         |     |                   |         |       |        |     |       |
   |   |   |   | City             |         |     | /city             |         |       |        |     |       |
   |   |   |   |   | name         | string  |     | name.text()       |         |       |        |     |       |
   |   |   |   |   | population   | integer |     | population.text() |         |       |        |     |       |

i.e. instead of name it should be name.text() and instead of population it should be population.text()

sirex commented 3 weeks ago

source XPath should be name/text().

Also, I'm thinking, maybe we don't need text()? If property receives an Element instance, then text() could be applied automatically?

I'm not sure if this could have any other implications?

karina-klinkeviciute commented 3 weeks ago

Yes, I agree. Then it would be easier on the eyes. But then it should be the same everywhere, probably. In converting XSD to DSA, the same.