dr-leo / pandaSDMX

Python interface to SDMX
Apache License 2.0
125 stars 58 forks source link

Possible to generate SDMX? #117

Closed brockfanning closed 4 years ago

brockfanning commented 4 years ago

Just a quick question about the current possibilities. Is it possible to use the API to manually combine data from some non-SDMX source and then output it to SDMX? In other words, if I've got all the necessary internal data structures (dimensions, concepts, observations, etc.), can I "output" to a new SDMX file?

khaeru commented 4 years ago

Hi @brockfanning—thanks for the question.

At the moment, no, it is not possible to write SDMX-ML (i.e. XML), SDMX-JSON, or SDMX-CSV. That is, however, a much-desired feature that's on the roadmap, and contributions in that direction would be very welcome.

In writing the SDMX-ML reader/parser that's currently on the master branch, I thought a little about what such writers/converters would look like. (E.g. are they bidirectional code that both reads from and writes to a specific format? Or entirely distinct? The latter seems easier and more modular.) Probably -CSV would be simplest, then -JSON and -ML. In the last case, because of the breadth of features, it would make sense to start with a minimal writer that only handles simple DataMessages, and then iterate towards fuller coverage of DataStructureDefinitions, etc.

Finally and perhaps tangential: I note that your GitHub profile has an "open-sdg" repo. I also saw the SDGs mentioned at https://siscc.org/initiatives/, a community that promotes data standards including SDMX. FYI, pandaSDMX is not a tightly affiliated or ‘official’ implementation (though it does adhere to the standard rigorously). I spend time working on it because I feel that the SDMX information model is very useful and well-thought-out, but using the Java or .NET reference implementations is not realistic for most researchers I know and work with.

brockfanning commented 4 years ago

@khaeru Thanks for the detail!

I'm in agreement that distinct readers and writers would be more modular. I like the idea of being able to read and/or write to any of the supported formats.

In case it's helpful I'll elaborate on my interest in SDMX. I've been working a lot on the open-sdg platform, which is a tool that countries can use to report on their SDG data, in the form of a static website. A key piece of the platform is the Python code in sdg-build, which takes source data/metadata, and converts it to static files that are usable by open-sdg to produce the necessary charts/maps/tables/etc.

But in addition to these human-readable charts/maps/etc, we would like open-sdg to also generate the machine-readable formats as well. Our first goal is to have open-sdg generate SDMX, since this is something that a lot of countries are asking for. So to that end, we need sdg-build to be able to output to SDMX. I'd far prefer to have sdg-build depend on PandaSDMX for this, rather than custom-coding it, so I'm definitely on board with focusing energies here.

Side note - in sdg-build we've already implemented the import of SDMX. This was necessary because some countries use data management systems that expose the data as SDMX. Unfortunately though, we custom-coded it. So we're also looking to eventually refactor that to use PandaSDMX instead.

dr-leo commented 4 years ago

Update: khaeru has implemented an sdmxml writer and released it in his fork named 'sdmx1' on pypi. That code will also be released in pandaSDMX v1.1.0 soon. See #182 which contains some major bug fixes and enhancements of the code merged from that fork.

IMHO an sdmxjson writer would be more useful as json is more light-weight than sdmxml. I understand that SDMX-TWG has finalized, or is about to finalize, an sdmxjson implementation for SDMX structural metadata, i.e. there is now a json equivalent of sdmxml structure messages.