IATI / refresher

A Python application which has the responsibility of tracking IATI data from around the Web and refreshing the core IATI software's data stores
GNU Affero General Public License v3.0
2 stars 0 forks source link

Lakify hierarchical JSON nesting function omits free-floating text when other children are present #291

Closed akmiller01 closed 12 months ago

akmiller01 commented 12 months ago

The recursive json nesting function of lakify will omit free-floating text when other children are present. For e.g.

    <iati-activity>
        Floating text
        <iati-identifier>ACT-1</iati-identifier>
    </iati-activity>

Would be serialized as:

{
  "iati-activity": [
    {
      "iati-identifier": [
        {
          "text()": "ACT-1"
        }
      ]
    }
  ]
}

Where one might expect:

{
  "iati-activity": [
    {
      "text()": "Floating text",
      "iati-identifier": [
        {
          "text()": "ACT-1"
        }
      ]
    }
  ]
}

This is because this floating text is not captured by element.text. It is, however, captured by element.itertext and could be derived like so:

inner_text = ''.join([inner_string.strip() for inner_string in element.itertext(tag=element.tag)])

Discovered while writing tests in PR #289

akmiller01 commented 12 months ago

Fixed with PR #292