Schemas are used to represent the structure that IATI XML is expected to be in. They contain a number of elements and attributes. Each of these has information that would be useful to extract. This includes descriptions, the occurrence properties, and XPaths that things occur at. Following research into this area, there does not appear to be a standard method to undertake this task using open tooling.
64 provides an initial attempt at extracting this information. This is, however, using tools that aren't really designed for the job, leading to hundreds of lines of fairly confusing code that is hard to comprehend, doesn't really handle all the cases that it needs to, and would be a challenge to maintain.
It is therefore proposed to implement this functionality using a two-stage process:
Utilise XSLT to transform the Schema into an Intermediate Representation (IR) that has the information structured in an easy-to-query format
Have capabilities available within the schemas module to access the information presented in the IR through a defined Python API
Based on preliminary investigation, the IR will likely:
Treat elements and attributes as equivalents
ie. an optional attribute would become: min_occurs = 0 and max_occurs = 1
Schemas are used to represent the structure that IATI XML is expected to be in. They contain a number of elements and attributes. Each of these has information that would be useful to extract. This includes descriptions, the occurrence properties, and XPaths that things occur at. Following research into this area, there does not appear to be a standard method to undertake this task using open tooling.
64 provides an initial attempt at extracting this information. This is, however, using tools that aren't really designed for the job, leading to hundreds of lines of fairly confusing code that is hard to comprehend, doesn't really handle all the cases that it needs to, and would be a challenge to maintain.
It is therefore proposed to implement this functionality using a two-stage process:
schemas
module to access the information presented in the IR through a defined Python APIBased on preliminary investigation, the IR will likely:
optional
attribute would become:min_occurs = 0
andmax_occurs = 1