Geonovum / sospilot

Sensor Observation Service (SOS) and data management for Air Quality data from RIVM The Netherlands
GNU General Public License v3.0
9 stars 5 forks source link

Create e-reporting (IPR) with RIVM data #7

Open thijsbrentjens opened 10 years ago

thijsbrentjens commented 10 years ago

Create (harmonised) data for: Zones & Agglomeration (INSPIRE theme: AM, Dataflow B in IPR) Stations (INSPIRE theme: EF, IPR Dataflow D) Aggregated results / statistics (IPR Dataflow F)

See the website: http://www.eionet.europa.eu/aqportal/datamodel for technical details and XML schema

justb4 commented 10 years ago

Ok, getting more grips on this after seeing the AM and EF XSDs: https://github.com/Geonovum/sospilot/tree/master/data/inspire/xsd and the AQ XSDs: https://github.com/Geonovum/sospilot/tree/master/data/eionet/xsd

There is also an RIVM example: https://github.com/Geonovum/sospilot/blob/master/data/eionet/xsd/REP_D-NL_RIVM_20140805_B-002.xml It is probably XML invalid as it does not contain a geometry field (think required), but it is a start.

The User Guide is also useful: http://www.eionet.europa.eu/aqportal/guidelines/UserGuide2_AQD_XML_v3.0_publication.pdf

justb4 commented 10 years ago

The RIVM report for "(B) Information on zones and agglomerations (Article 6)" https://github.com/Geonovum/sospilot/blob/master/data/eionet/xsd/REP_D-NL_RIVM_20140805_B-002.xml is quite complete. With one small change it validates against the AQD AM schema: AirQualityReporting.xsd http://dd.eionet.europa.eu/schemas/id2011850eu-1.0/AirQualityReporting.xsd. The change only deals with a GML issue where deprecatedTypes.xsd is required in the schemaLocation:

 http://www.opengis.net/gml/3.2 http://schemas.opengis.net/gml/3.2.1/deprecatedTypes.xsd

Only the geometry (am:geometry element) refers to a Shapefile in EPSG:28992:

         <am:geometry
                xlink:href="http://cdr.eionet.europa.eu/nl/eu/aqd/b/envu9_csq/airquality_zone_agglomeration_v2013.shp"/>

while Germany uses real GML geometries for the am:geometry field like:

  <am:geometry>
    <gml:Polygon gml:id="ZON.PG.DEZAXX0001O" srsName="urn:ogc:def:crs:EPSG::4326">
      <gml:exterior>
        <gml:LinearRing>
          <gml:posList srsDimension="2">52.703983225045874 13.989109895305489 52.705805777614614 13.98454702572468 .....

The question is what we should do with this? Replace Shapefile xlink with inline GML geometry? The refered Shapefile is valid and can be read into PostGIS and shown via WMS, see Heron viewer: http://sensors.geonovum.nl/heronviewer (Zones and Agglomeration layer). Is thus Dataflow B complete already?

justb4 commented 10 years ago

And also REP_D-NL_RIVM_20140805_D-002.xml (http://cdr.eionet.europa.eu/nl/eu/aqd/d/envu9_j7q/ Dataflow D validates OK, refers to B for zones).

justb4 commented 10 years ago

In summary: the question is: what is there specifically to do (for us) in this issue? Several Dataflows like B and D have been uploaded to the Eionet AQ Portal on aug 5, 2014, but also E (Measurements 223MB). There is feedback, like for E: "NL has reported both hourly and daily values. This results in 'double' observation objects. It is not intended to report aggregated data so they should report in this case only hourly data." I don't know how these uploaded dataflow files were made (handmade/generated/ETL tools?) and who is now working on the feedbacks.

How to proceed? I will ask Hans/RIVM.

justb4 commented 10 years ago

Ok learned more from existing dataflow reports, EEA workshops, IRCELINE approach and more:

All in all, I did a PoC using a proven technology, although probably never used in the domain of (INSPIRE) GML Application Schema's: Python Templating. Python web frameworks like Django have a long and proven history of using Templating for generating HTML, XML or any other document from input structures applied to a Template to render a Document. There are numerous examples of Templating Languages: Django, Mako, Jinja2, Genshi, Mustache, see https://wiki.python.org/moin/Templating. Most Templating Languages like Mustache are not even bound to Python.

Stetl http://stetl.org is a Python ETL framework, based on Input, Filter (transform) and Output modules. XML transformation up to now was mostly done using an XSLTFilter, also for harmonizing INSPIRE data from local sources. However: XSLT has the disadvantage of being verbose, passive/recursive/matching-driven, complex and without hardly any variable/control/function structuring possibilities other than proprietary XSLT-processor bindings or built-in functions.

So I start trying an approach using Python Templating: https://wiki.python.org/moin/Templating, first within Stetl with some simple examples: using the Python built-in String Templating: https://github.com/justb4/stetl/tree/master/examples/basics/9_string_templating. But after that I made a choice from the zillion Python Templating Languages with one that is the used most and has a very active development community: Jinja2 http://jinja.pocoo.org. Jinja2 is for example extensively used in the widely known Django framework and in the Open Data world within CKAN: http://docs.ckan.org/en/latest/theming/templates.html. Learning Jinja2, I had never used Python Templating, only Java JSP which is a bit similar, took just a couple of hours and browsing examples on the Web. The development of a Jinja2TemplatingFilter in Stetl was therefore quite trivial, about 30 lines of code, all TemplatingFilters are in: https://github.com/justb4/stetl/blob/master/stetl/filters/templatingfilter.py. Jinja2 allows standard Template control structuring like loops, i.e. for looping over Features, but also a concept of "globals", variables to be applied globally. This proved to be very convenient for common "Boilerplate" data like organisations, telephone numbers etc. A worked out example is at: https://github.com/justb4/stetl/tree/master/examples/basics/10_jinja2_templating In this example a simple (XML) and advanced (GML) transformation is illustrated. The advanced example shows: macros, globals and Filters.

Applying the Jinja2TemplatingFilter to RIVM AQ Reporting proved to be almost trivial. A PoC is done for mapping the WFS RIVM Stations FeatureType to a Dataflow D report. The example can be found at https://github.com/Geonovum/sospilot/tree/master/src/aq-report. This Dataflow D report was created in just a few hours and contains no custom code, just some Templates with about 50 lines of Jinja2 code! This could be enhanced further with macros etc. , but it shows a very promising approach for AQ Reporting and IMO even INSPIRE harmonization. Jinja2 is so common that almost all IDEs like IDEA and Eclipse will support its syntax. Plus it is blazingly fast. So far: Enter The Jinja2!

I see many advantages in this approach:

thijsbrentjens commented 10 years ago

This seems very promising and nice work. We could try this for the other dataflows with GML as well, so the whole process could be automated (instead of using the manual reporting tooling of EIONET as is done now).

thijsbrentjens commented 10 years ago

A short remark about these xlinks to the features in Dataflow D: is there an encoding rule for the values in xlink:href? Shouldn't the values match with gml:id values (in that GML document)? In that case, either the INSPIRE namespace should be in the gml:id or the xlink values should be changed.

justb4 commented 10 years ago

The xlinks are encoded according to the UserGuide, see link above or the aqportal. Yes normally I would expect a #gmlid and a gml id in the target element. Well, why use xlinks at all? This is a featurecollection...Also strange is that the ReportingHeader is a feature and not in the general sectio of the FC... Just van den Broecke @Nexus10

Thijs Brentjens notifications@github.com wrote:

A short remark about these xlinks to the features in Dataflow D: is there an encoding rule for the values in xlink:href? Shouldn't the values match with gml:id values (in that GML document)? In that case, either the INSPIRE namespace should be in the gml:id or the xlink values should be changed.

— Reply to this email directly or view it on GitHub.

justb4 commented 10 years ago

Thanks! I think this can work. Main problem is getting the source data from RIVM. Best is to build web services for this source data. Probably most can be effected with data in PostGIS and WFS + SOS. Both can deliver data as (geo)json like in my example. By using VIEWs we can join tables and make selections (or via table join service ;-))... Just van den Broecke @Nexus10

Thijs Brentjens notifications@github.com wrote:

This seems very promising and nice work. We could try this for the other dataflows with GML as well, so the whole process could be automated (instead of using the manual reporting tooling of EIONET as is done now).

— Reply to this email directly or view it on GitHub.

thijsbrentjens commented 10 years ago

About the xlinks: this could be a flaw in the UserGuide then. It doesn't make sense. But these xlinks are not very useful here at first sight indeed.

About getting the source data: using web services that provide the source data (also useful as a "simple" version of the data) would be a nice approach. Not sure about the table join service here, but who knows :). If the database and some extra views in it are sufficient, then that would be good as well.

justb4 commented 10 years ago

xlink-in-aqipr

UserGuide page 34 e.v. I think we also see the flaw in INSPIRE id: a nested structure i.s.o. a hierarchical id...The document would not even validate IMO.

justb4 commented 10 years ago

But rendering an INSPIRE id struct could be our first Jinja2 macro :-).

thijsbrentjens commented 10 years ago

For Dataflow B we need some more input of RIVM: what data sources to use to create the dataflow?

thijsbrentjens commented 10 years ago

RIVM offers a WFS with aqd_zones (http://acceptatie.inspire.rivm.nl/geoserver/wfs?request=GetFeature&typeName=inspire:aqd_zone&outputformat=JSON). This is in accept, we need to check if we can use this data, since it is not in RIVM's production WFS.

thijsbrentjens commented 10 years ago

Alternatively, we could use the shapefile as offered in the AQPortal: http://cdr.eionet.europa.eu/nl/eu/aqd/b/envu9_csq

thijsbrentjens commented 10 years ago

For dataflow B, the pollutants need to be mapped to the vocabulary of AQ. The definitions can be found at: http://dd.eionet.europa.eu/vocabulary/aq/pollutant/view Available as Linked Data, e.g. in RDF-XML or JSON-LD:

http://dd.eionet.europa.eu/vocabulary/aq/pollutant/rdf

and:

http://dd.eionet.europa.eu/vocabulary/aq/pollutant/json

justb4 commented 10 years ago

At sensors.geonovum.nl we have already a WFS (and WMS) based on the above Shapefiles, see http://sensors.geonovum.nl/gs/wfs?request=GetFeature&typeName=sensors:zones&outputformat=JSON . We can come quite far, but we are still lacking zone data attributes, like pollutants for the Zones. Also the corresponding properties are different for both WFSs and each lacking in data. For example for Zone Heerlen/Kerkrade the RIVM WFS has these properties (e.g. population and area are null, zone_type should be 'agg' or 'nonagg' etc):

{"properties": {
    "inspireid": "http://data.rivm.nl/inspire/so/ef/aqd-zone/NL0320/0",
    "zone_code": "NL0320",
    "versionid": 0,
    "predecessor": null,
    "beginlifespanversion": "2013-05-06T11:03:35.926Z",
    "endlifespanversion": null,
    "zone_name": "Heerlen/Kerkrade",
    "zone_type": "airQualityManagementZone",
    "application_start_date": "2001-06-20T22:00:00Z",
    "application_end_date": null,
    "documentation_of_predecessors": null,
    "resident_population": null,
    "resident_population_ref_year": null,
    "area_of_zone_value": null,
    "area_of_zone_uom": "sqm",
    "designated_pollutant": null,
    "protection_target": "Health",
    "timeextensionexemption": "NO2-annual",
    "environmental_domain": "air",
    "plan": null,
    "legalbasis": "Directive 2008/50/EC of the European Parliament and of the Council of 21 May 2008 on ambient air quality and cleaner air for Europe",
    "relatedzone": null,
    "authority_name": "Ministerie van I&M",
    "webaddress": "http://www.rijksoverheid.nl/ministeries/ienm",
    "responsible_person_name": "Inge van der Veen",
    "address": "Plesmanweg 1-6 2597JG Den Haag",
    "telephone_number": "+31704560000",
    "email": null
}}

While the Geonovum "Sensors"WFS has (missing e.g. point of contact):

{"properties": {
    "objectid": 9,
    "geometry_l": 81872.10739,
    "geometry_a": 1.74366211597E8,
    "zone_code": "NL0320",
    "zone_name": "Heerlen/Kerkrade",
    "zone_name_": "Agglomeratie Heerlen/Kerkrade",
    "start_year": 2011,
    "end_year": null,
    "zone_type": "agg",
    "zone_popul": 231870,
    "zone_pop_1": 2013,
    "zone_area_": 174366,
    "zone_area1": 174,
    "zone_prede": null
}}

Both WFSs e.g. are missing the pollutants (while those may be tied to the Stations who are tied to zones). An overall database-schema within RIVM would help tremendously.

Though the Aq-Portal provides CSVs from the XMLs, this would be just temporary, e.g. for Zone B 3 CSVs: Dataflow B: General AQ zones information as CSV http://cdr.eionet.europa.eu/Converters/run_conversion?file=nl/eu/aqd/d/envu9_j7q/REP_D-NL_RIVM_20140805_D-002.xml&conv=469&source=remote

Dataflow B: Pollutant and protection targets as CSV http://cdr.eionet.europa.eu/Converters/run_conversion?file=nl/eu/aqd/d/envu9_j7q/REP_D-NL_RIVM_20140805_D-002.xml&conv=470&source=remote

Dataflow B: Competent Authorities for AQ zones http://cdr.eionet.europa.eu/Converters/run_conversion?file=nl/eu/aqd/d/envu9_j7q/REP_D-NL_RIVM_20140805_D-002.xml&conv=471&source=remote

thijsbrentjens commented 10 years ago

Okay, I'll have a lookt at these new zones. We need information from the RIVM on their data sources for the missing properties I think. But for a first version, I'll give it a try with what we have.

justb4 commented 10 years ago

Yes, a good approach. I think a challenge is to convert GeoJSON to GML Geometries. For this I plan to add a Custom Jinja2 Filter in Stetl: http://jinja.pocoo.org/docs/dev/api/#custom-filters

Probably using Python OGR http://www.gdal.org/classOGRGeometry.html to read GeoJSON and export to GML Geometry. Like examples in http://pcjericks.github.io/py-gdalogr-cookbook/geometry.html

    geojson = """{"type":"Point","coordinates":  [108420.33,753808.59]}"""
    geom = ogr.CreateGeometryFromJson(geojson)
    gml_str = geom.ExportToGML  (options)

The filter expression for each zone object becomes something like

  <am:geometry>
      {{ zone.geometry | geojson2gml(version=2.1.2) }}
  </am:geometry>

I will only do the Stetl-part within Stetl, you can go ahead with flow B. Hope to have something today. You now can use latest Stetl via sudo pip install stetl (v1.0.6).

On 08-09-14 11:46, Thijs Brentjens wrote:

Okay, I'll have a lookt at these new zones. We need information from the RIVM on their data sources for the missing properties I think. But for a first version, I'll give it a try with what we have.

— Reply to this email directly or view it on GitHub https://github.com/Geonovum/sospilot/issues/7#issuecomment-54796671.

thijsbrentjens commented 10 years ago

Okay, I leave the geom for what is it now. I'm using ogr2ogr's -sql option to join the information from the CSV to the GeoJSON file, that seems to work. Would that be easy / interesting to use with Stetl?

Example command:

ogr2ogr -sql "select * from OGRGeoJSON a left join 'zonesattr.csv'.zonesattr b on a.zone_code = b.zone_code" -f "GeoJSON" zones-joined.json zones.json

Edit: OGR2OGR shortens the attribute names, like is done in Shapefile column names. Maybe this is not the best approach for joining the CSV file, but we could change this when it is clear how RIVM could / would deliver the data

thijsbrentjens commented 10 years ago

Note that for joining there are different codes used sometimes: e.g. in our WFS we have NL0201 for Midden, in the CSV file Zones_NL-001_upd.csv it is NL0200. I can fix them now, but this is something we need to discuss with RIVM.

thijsbrentjens commented 10 years ago

Note that this also means that the joined data might not contain correct values.

justb4 commented 10 years ago

Yes, that would be very useful as a Stetl input. Up to now there is only an OgrPostGISInput: https://github.com/justb4/stetl/blob/master/stetl/inputs/ogrinput.py but on the TODO is a generic Ogr2ogrInput similar to the Ogr2OgrOutput https://github.com/justb4/stetl/blob/master/stetl/outputs/ogroutput.py

Alternatively we could extend the Jinja2TemplatingFilter to be able to read multiple "Globals" files (or via services). Those may be referenced within the template, like {{ globs.zonesattr[zone.zone_code].anyattr }}. I was planning multiple globals already.

But for now your approach using the joined local file zones-joined.json is ok. Ultimately I think a REST or WFS service should deliver the (joined) data, so we won't need local files (though the "globals" approach would allow remote fetching and joining on the fly in the template).

On 08-09-14 13:28, Thijs Brentjens wrote:

Okay, I leave the geom for what is it now. I'm using ogr2ogr's -sql option to join the information from the CSV to the GeoJSON file, that seems to work. Would that be easy / interesting to use with Stetl?

Example command:

ogr2ogr -sql "select * from OGRGeoJSON a left join 'zonesattr.csv'.zonesattr b on a.zone_code = b.zone_code" -f "GeoJSON" zones-joined.json zones.json

— Reply to this email directly or view it on GitHub https://github.com/Geonovum/sospilot/issues/7#issuecomment-54805326.

justb4 commented 10 years ago

Stetl now supports GeoJSON geometry to GML geometry filtering in Jinja2 template. See the example:

  {% for feature in features %}
    <gml:featureMember>
        <cities:City>
            <cities:name>{{ feature.properties.CITY_NAME }}</cities:name>
            <cities:geometry>
                {{ feature.geometry | geojson2gml(crs=crs, gml_format='GML3', gml_longsrs='YES') }}
            </cities:geometry>
        </cities:City>
    </gml:featureMember>
  {% endfor %}

Template in example: https://github.com/justb4/stetl/blob/master/examples/basics/10_jinja2_templating/templates/cities-gjson2gml.jinja2. All parameters are optional: crs may be number, string like 'EPSG:4258' or crs structure from GeoJSON feature collection. Default is 4326. The other parameters determine srsName and GML3 or other output. With longname (or string of format EPSGA:4326) axis is ordered YX.

Python with ogr bindings needs to be importable.

thijsbrentjens commented 10 years ago

Nice work Just. I tested it and it seems to work fine. One major thing left is dealing with matching the pollutant codes to URIs as defined in the vocabulary. I'm trying to use a SKOS and RDF parser now, but this still requires some Python code to process.

justb4 commented 10 years ago

Good to see your first version of the Zones to Dataflow B ETL!

Pollutant codes: I checked in a sort of hack but we may get those globals from a service. This is the relevant and standard Jinja2 code:

         {% set zone_pollutants = 

feature.properties.zone_pollutant.split(';') %} aqd:pollutants {% for zone_pollutant in zone_pollutants %} aqd:Pollutant /aqd:Pollutant {% endfor %} /aqd:pollutants

with globs defined as:

     "pollutant_defs": {
         "BaP-H": {
             "pollutant_code": 

"http://dd.eionet.europa.eu/vocabulary/aq/pollutant/29", "protection_target": "http://dd.eionet.europa.eu/vocabulary/aq/protectiontarget/H" }, "Benzene-H": { "pollutant_code": "http://dd.eionet.europa.eu/vocabulary/aq/pollutant/20", "protection_target": "http://dd.eionet.europa.eu/vocabulary/aq/protectiontarget/H" },

But possibly there is a better solution....I checked in a bit too much: I had several other .xls's from the RIVM FTP server.

On 08-09-14 17:13, Thijs Brentjens wrote:

Nice work Just. I tested it and it seems to work fine. One major thing left is dealing with matching the pollutant codes to URIs as defined in the vocabulary.

— Reply to this email directly or view it on GitHub https://github.com/Geonovum/sospilot/issues/7#issuecomment-54835131.

kind regards / met vriendelijke groet,

--Just

Just van den Broecke just@justobjects.nl Just Objects B.V. tel +31 65 4268627 Skype: justb4 The Netherlands http://www.justobjects.nl

justb4 commented 10 years ago

Refined the GML macros, with the usual GML-mess: GML3 vs GML2 encoding and Axis Ordering. Next is to migrate most/all GML macro's to Jinja2 Filters in Stetl that use Python OGR. Also updated validate.sh to include dataflow-B. dataflow-D output now validates against AQD/INSPIRE/GML schemas.

justb4 commented 10 years ago

Tip: if you commit to GitHub and provide the issue number in the commit message, that message will appear here, for example:

  git commit -m "issue #7 - refinement macros-gml.jinja2: cater for GML2/GML3-constructs and axis ordering, example in Dataflow-D ETL"
thijsbrentjens commented 9 years ago

Thanks for the tip, I just forgot that with the previous commit.

Regarding the pollutant codes : I'd think direct support for looking up the codes in the SKOS vocabularies would be elegant (I found some python libs for that), but I can generate the "codelist" to use that in the globs for now.

justb4 commented 9 years ago

SKOS-based data, where (URL) is this service? I was under the impression that the component-codes like 'NO2-H' were RIVM-specific. But eventually we should be able to generate reports from live services: WFS, SOS, REST, SPARQL, whatever.

Now data can be applied to a Jinja2 template via either standard input data (JSON file) via a Jinja2 Context and or "globals" (also JSON file) via Jinja2 Environment. I think "globals" should be kept to a minimum. There are two limitations right now (in Stetl):

Within Jinja2 (and in Stetl) the input file is passed as a "Jinja2 Context", in Python a dict (hashmap), for example "features" as used in our templates is in fact a key from a dict. same for the globals ("globs" or whatever is named as top-key). Useful Stetl-extensions thus could be the following:

Another possibility, a bit-more involved is to develop "smart" Jinja2 Filters, that will actually invoke an external web service like a SPARQL end-point....

justb4 commented 9 years ago

Another update: the Jinja2 Filter to generate GML from GeoJSON geometry has been improved in latest GitHub Stetl and is used in Dataflow-D jinja2 template, as follows:

    <ef:geometry>
            {# Generate a Point (or any other) GML geometry from a GeoJSON geometry using the geojson2gml
              Jinja2 custom Filter.
             By specifying a target_crs we can even reproject from the source CRS.
             The gml_format=GML2|GML3 determines the general GML form: e.g. pos/posList or coordinates. gml_longsrs=YES|NO
             determines the srsName format like EPSG:4326 or urn:ogc:def:crs:EPSG::4326 (long).
             gml_longsrs=YES will also do XY swapping (lat/lon) for lat/lon based projections.
            Generate gml id first (gml:id is GML3-specific and optional) #}
           {% set gml_id = 'STA_G-%s' % feature.properties.local_id %}
          {{ feature.geometry | geojson2gml(source_crs=crs, target_crs=4258, gml_id=gml_id, gml_format='GML3', gml_longsrs='YES') }}
    </ef:geometry>

The output then becomes like:

      <ef:geometry>
          <gml:Point srsName="urn:ogc:def:crs:EPSG::4258" 
             gml:id="STA_G-STA-NL00235">
              <gml:pos>51.43500137 4.36028624</gml:pos>
         </gml:Point>
      </ef:geometry>
justb4 commented 9 years ago

Ok, in latest Stetl GH version it is possible with Jinja2 filter to configure:

it seems to make more sense to use the globals for "reference-data" or data to be expanded/joined while the input-data is the core data. But experience will tell....

See example (Example 3, bottom) at https://github.com/justb4/stetl/blob/master/examples/basics/10_jinja2_templating/etl.cfg

Maybe problem is that some services don't return JSON but XML...

thijsbrentjens commented 9 years ago

Yesterday I have created an XSLT to extract the notations and URIs to use in the Jinja globs. It transforms the RDF from http://dd.eionet.europa.eu/vocabularies?expand=true&expanded=&folderId=1, e.g. for pollutants: http://dd.eionet.europa.eu/vocabulary/aq/pollutant/view and the RDF http://dd.eionet.europa.eu/vocabulary/aq/pollutant/rdf

justb4 commented 9 years ago

Mooi werk, eleganter met de parts-split en lookup van pollutant def en protection target def via Jijna2 template. De geometry zou nu ook via nieuwe Filter (laatste Stetl GH versie) moeten kunnen worden ingevuld, spannend, nog niet voor MultiPolygon geprobeerd, zal iets moeten worden als

       {% set gml_id = 'ZON_G-%s' % feature.properties.local_id %}
       {{ feature.geometry | geojson2gml(source_crs=crs, target_crs=4258, gml_id=gml_id, gml_format='GML3', gml_longsrs='YES') }}

Is het gelijk in INSPIRE ETRS89...

thijsbrentjens commented 9 years ago

An exact match for the code "Benzene" (as used in RIVM values) seems to be missing in the vocabulary. We need to discuss with RIVM what to do here.

justb4 commented 9 years ago

Great! Dataflow-B now with MultiSurface's. You can run ./validate.sh for schema validation. Apart from Benzene there is a validation issue with empty am:beginLifespanVersion. Looking at the existing examples I placed under https://github.com/Geonovum/sospilot/tree/master/data/eionet/aq-report, I see that the date of report-generation is used, e.g. for the 5 aug 14 Dataflow-B report:

        <am:beginLifespanVersion>2014-08-05T10:04:00+01:00</am:beginLifespanVersion>

Maybe there is a Jinja2 'current_date' template or we could add one, or via a macro.

justb4 commented 9 years ago

Is Benzene niet http://dd.eionet.europa.eu/vocabulary/aq/pollutant/20 (Benzene (air)? De pollutant code is welliswaar C6H6 maar dat is Benzeen (hexagon van 6 koolstof-atomen, met ieder 1 H-atoom). Heb ik toch nog wat aan mijn scheikunde studie :-).

thijsbrentjens commented 9 years ago

Correct, Benzene is C6H6. The thing is: how to map this automatically using the codes RIVM provides? I'd say let's create an exception for now and try to find out why RIVM uses their codes.

justb4 commented 9 years ago

On 15-09-14 11:18, Thijs Brentjens wrote:

Correct, Benzene is C6H6. The thing is: how to map this automatically using the codes RIVM provides? I'd say let's create an exception for now and try to find out why RIVM uses their codes.

— Reply to this email directly or view it on GitHub https://github.com/Geonovum/sospilot/issues/7#issuecomment-55567920.

Yes, that is why I assumed that the mapping from like "BaP-H;Benzene-H;CO-H;NO2-H;O3-H;O3-V;PM10-H;PM2.5-H;SO2-H" was RIVM-internal/specific.

justb4 commented 9 years ago

De laatste XML RIVM AQ bestanden van RSpoor toegevoegd en naar CSV omgezet. Zie https://github.com/Geonovum/sospilot/tree/master/src/aq-report/input/rspoor

Begin gemaakt met Dataflow-C AQD_AssessmentRegime ETL. Is te doen. Voornaamste 2 onduidelijkheden:

. de mapping van Pollutant naar een Eionet Codelist URI, bijv "BaP" moet worden http://dd.eionet.europa.eu/vocabulary/aq/pollutant/5029 ("BaP in PM10") maar er matchen meerdere URIs

. hoe de data te verkrijgen voor elementen binnen aqd:environmentalObjective, dus bijv

                <aqd:environmentalObjective>
                    <aqd:EnvironmentalObjective>
                        <aqd:objectiveType xlink:href="http://dd.eionet.europa.eu/vocabulary/aq/objectivetype/TV"/>
                        <aqd:reportingMetric
                                xlink:href="http://dd.eionet.europa.eu/vocabulary/aq/reportingmetric/aMean"/>
                        <aqd:protectionTarget
                                xlink:href="http://dd.eionet.europa.eu/vocabulary/aq/protectiontarget/H"/>
                    </aqd:EnvironmentalObjective>
                </aqd:environmentalObjective>