Open thomiko opened 6 years ago
The LSOA names in London start with the name of the Borough and the LSOA labels do not have a pattern. However, all boroughs in London have a label starting the string E090. Hence the way to get all LSOAs in London is: { "subjectType": "lsoa", "provider": "uk.gov.ons", "geoMatchRule": { "geoRelation": "within", "subjects": [ { "subjectType": "localAuthority", "provider": "uk.gov.ons", "matchRule": { "attribute": "label", "pattern": "E090%" } } ] } }
After having downloaded the external Excel file, the gradle export fails with a Java OutOfMemoryError:
Downloading external resource: https://www.gov.uk/government/uploads/system/uplo ads/attachment_data/file/357469/acs0507.xls Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceede d at org.apache.poi.hssf.usermodel.HSSFRow.createCellFromRecord(HSSFRow.ja va:223)
My laptop has 8GB of RAM, so that's not necessarily the problem. Is there a way for the gradle runExport process to assign more RAM to the process, for example by means of a commandline parameter or from an ini file?
For example in an R script you can do: options(java.parameters = "-Xmx4g" )
This often avoids Java outOfMemory problems because by default an R script only receives 1GB of RAM.
You could look at changing the value for the runExport process in the build.gradle
If I want to retrieve the green areas info for London similar to the 'greenspace-hertfordshire.json' example recipe, what do I have to put in here?
If I do it this way, I get the following error message:
-----> TASK FAILED: http://download.geofabrik.de/europe/great-britain/england/lo ndon-latest.osm.pbf<-----
java.io.FileNotFoundException: http://download.geofabrik.de/europe/great-britain /england/london-latest.osm.pbf
The OSM file for london is called: europe/great-britain/england/greater-london
See here a list of different areas you could put:
https://download.geofabrik.de/europe/great-britain/england.html
What does this targetCRSCode refer to? Is it a reference to the location being queried? It's from the green areas example recipe.
Back to the Java OutOfMemory error: I set all the MaxHeapSize values to the maximum value (i.e. the RAM available)
With these settings the export process ran much longer (~ 40 mins) than before but eventually failed again with an OutOfMemoryError:
2018-03-17 18:10:10.160 [main] INFO u.org.tombolo.importer.DownloadUtils - Fetc hing local file: C:\tmp\TomboloData\uk.gov.dft\767f0676-d56b-3d2f-ab29-754898185 b8e.xls Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at org.apache.poi.hssf.usermodel.HSSFRow.createCellFromRecord(HSSFRow.ja va:223)
In this case the target code is used to determine the unit used for the area. In the WGS4326 code the unit is degrees, which gives hard to interpret numbers for area. However the 27700 code uses the metric system and the results are more easily interpreted.
So for London I should use the 27700 code as well?
Yes
Regarding the out of memory ... did you change the value for runExport as well?
But more generally, I think that we can conclude that we need to look at this importer after the weekend and find a more scalable solution.
When I 'runExport' the following recipe to get the green areas for London, the build's successful in 10 seconds but the output file is almost empty:
{ "dataset": { "subjects": [ { // The output subjects are all LSOAs "provider": "uk.gov.ons", "subjectType": "lsoa", "matchRule": { "attribute": "name", "pattern": "E090%" } } ], "datasources": [ { // Importer for LSOA geographies "importerClass": "uk.org.tombolo.importer.ons.OaImporter", "datasourceId": "lsoa" }, { //": "Green space data for the entire UK", "importerClass": "uk.org.tombolo.importer.osm.OSMImporter", "datasourceId": "OSMGreenspace", "geographyScope": ["europe/great-britain/england/greater-london"] } ], "fields": [ { //Proportion of green space "fieldClass": "uk.org.tombolo.field.transformation.ArithmeticField", "label": "index:GreenspaceFraction", "operation": "div", "field1": { // Sum of green space areas "fieldClass": "uk.org.tombolo.field.aggregation.GeographicAggregationField", "label": "GreenspaceSum", "subject": { "provider": "org.openstreetmap", "subjectType": "OSMEntity" }, "function": "sum", "field": { "fieldClass": "uk.org.tombolo.field.assertion.OSMBuiltInAttributeMatcherField", "label": "AreaGreenspace", "attributes": [ { "provider": "org.openstreetmap", "label": "built-in-greenspace" } ], "field": { // Area of LSOA "fieldClass": "uk.org.tombolo.field.transformation.AreaField", "label": "AreaLSOA", "targetCRSCode": 27700 } } }, "field2": { // Area of LSOA "fieldClass": "uk.org.tombolo.field.transformation.AreaField", "label": "AreaLSOA", "targetCRSCode": 27700 } }, { // Sum of green space areas "fieldClass": "uk.org.tombolo.field.aggregation.GeographicAggregationField", "label": "component:GreenspaceSum", "subject": { "provider": "org.openstreetmap", "subjectType": "OSMEntity" }, "function": "sum", "field": { "fieldClass": "uk.org.tombolo.field.assertion.OSMBuiltInAttributeMatcherField", "label": "AreaGreenspace", "attributes": [ { "provider": "org.openstreetmap", "label": "built-in-greenspace" } ], "field": { // Area of LSOA "fieldClass": "uk.org.tombolo.field.transformation.AreaField", "label": "AreaLSOA", "targetCRSCode": 27700 } } }, { // Area of LSOA "fieldClass": "uk.org.tombolo.field.transformation.AreaField", "label": "component:AreaLSOA", "targetCRSCode": 27700 } ] }, "exporter": "uk.org.tombolo.exporter.GeoJsonExporter" }
What's wrong with it?
Output: {"type":"FeatureCollection","features":[]}
The subject specification for LSOAs in london is:
{
"subjectType": "lsoa",
"provider": "uk.gov.ons",
"geoMatchRule": {
"geoRelation": "within",
"subjects": [
{
"subjectType": "localAuthority",
"provider": "uk.gov.ons",
"matchRule": {
"attribute": "label",
"pattern": "E090%"
}
}
]
}
}
instead of
{ // The output subjects are all LSOAs "provider": "uk.gov.ons", "subjectType": "lsoa", "matchRule": { "attribute": "name", "pattern": "E090%" } }
Unfortunately still no full success:
-----> TASK FAILED: Could not compute Field component:AreaLSOA for Subject E0100 0001(2480), reason: For input string: "590983,03"<----- Caused by null
java.lang.IllegalArgumentException: Could not compute Field component:AreaLSOA f or Subject E01000001(2480), reason: For input string: "590983,03" at uk.org.tombolo.exporter.GeoJsonExporter.lambda$getPropertiesForSubjec t$0(GeoJsonExporter.java:71)
Interesting ... It could that the digital connector is not German proof :/ (need more debugging to be sure)
I.e. it could be that some of the system outputs numbers using your localised environment (using commas for decimals) but another part of the system does not use the localised version (using dots for decimals).
Thanks for hanging in there and trying ... sorry for things not working well.
{ "dataset": { "subjects": [ { // The output subjects are all LSOAs "provider": "uk.gov.ons", "subjectType": "lsoa", "matchRule": { "attribute": "name", "pattern": "London%" }
} ], "datasources": [ { "importerClass": "uk.org.tombolo.importer.dft.AccessibilityImporter", "datasourceId": "acs0507" },
{ // Importer for LSOA geographies "importerClass": "uk.org.tombolo.importer.ons.OaImporter", "datasourceId": "lsoa" } ], "fields": [ { // Area of LSOA "fieldClass": "uk.org.tombolo.field.value.LatestValueField", "label": "component:Travel time", "attribute": { "provider": "uk.gov.dft", "label": "SUPO008"
} } ] }, "exporter": "uk.org.tombolo.exporter.GeoJsonExporter" }