J535D165 / CoronaWatchNL

Numbers concerning COVID-19 disease cases in The Netherlands by RIVM, LCPS, NICE, ECML, and Rijksoverheid.
Creative Commons Zero v1.0 Universal
142 stars 73 forks source link

Extract info from pdf maps #48

Closed J535D165 closed 4 years ago

J535D165 commented 4 years ago

I think it is quite doable to extract the color values from the maps in an automated way.

Outline:

Is someone interested in giving this one a try?

ghostleyjim commented 4 years ago

FYI: they use highcharts sw https://www.highcharts.com/blog/products/highmaps/226-get-your-data-ready-for-charts-with-python/

vmenger commented 4 years ago

If someone could help computing the center of mass of each municipality, I can give the rest a shot. Not really a GIS master but the rest is doable.

J535D165 commented 4 years ago

https://github.com/J535D165/CoronaWatchNL/blob/master/ext/gemeente-2019.geojson is the geojson file they use. Put this in geopandas and compute the centroid? I think this is build-in.

It is best to convert the coordinates to RD coordinates first. WGS84 is hard to map on an image.

vmenger commented 4 years ago

Thanks. Never worked with geopandas before but that was extremely straightforward. I'll see how far I get with the rest this afternoon.

vmenger commented 4 years ago

I cannot seem to figure out what kind of coordinate system they are using. The centroid of the shapes are for example:

0 POINT (562294.302 1519071.007) 1 POINT (567552.915 1519778.692) 2 POINT (546904.455 1507314.795) 3 POINT (553960.209 1521725.855) 4 POINT (455268.367 1412300.520) ...
350 POINT (400357.530 1346313.608) 351 POINT (539962.303 1524591.852) 352 POINT (527592.823 1506619.934) 353 POINT (507024.078 1519972.903) 354 POINT (426992.790 1358869.965)

I also cannot find an appropriate linear transformation between these coordinates and the expected pixel values. But perhaps someone recognizes them?

EDIT: Got it https://epsg.io/28992

J535D165 commented 4 years ago

Sorry for missing your post, mentioning RD in my previous post was a bit too cryptic.

Thanks a lot for your effort.

vmenger commented 4 years ago

The coordinate system (https://epsg.io/28992) still does not match. Points fall out of the bounding box. Have some other stuff to do right now, will probably resume tonight.

Digidodo commented 4 years ago

When using gemeente-2019.geojson provided here i also have strange results. The polygons are plotted to much north. I downloaded the CBS municipality borders 2020 and made a demomap. You can download the data i used by clicking "download gemeentegrenzen" Hope it's useful. @vmenger

Digidodo commented 4 years ago

A (demo map) with my results using gemeente-2019.geojson. Just for information.

J535D165 commented 4 years ago

@vmenger. There is RD old and new. Based on the examples above, this seems to be RD old. But, is there a need to convert? I think we can rescale those coordinates. Do you have code to read the pdf image into numpy array?

The tricky part is to find de bounding boxes of the image. You can compute the bboxes of the geojson file with geopandas.

@Digidodo Thanks for your help! We need the municipalities of 2019 I think.

Btw, the gemeente-2019.geojson is the one from the website of RIVM. So that one maps well with the municipalities.

Reverse engineering is fun!

Hazedd commented 4 years ago

https://github.com/cartomap/nl

For geojson RD new and wgs84 Source cbs should be projected correctly

2d Amersfoort / RD Old in meters, Amersfoort heeft coördinaten (0 m, 0 m). (EPSG) 28992: Geprojecteerd coördinatensysteem 2dAmersfoort / RD New in meters, Amersfoort heeft coördinaten (155000 m, 463000 m). 3d (EPSG) 7415: Geprojecteerd coördinatensysteem Amersfoort / RD New + NAP hoogte in meters.

vmenger commented 4 years ago

gemeenten_mapped_to_page

Didn't find the time today to get further, but getting there with the RIVM geo file. Match is not yet perfect but can be tweaked a little. For now I did it by selecting some anchor points on the image and then mapping (scale+transpose) to the page. But obviously this is very sensitive to moving the image around on the page. Put the code so far here, if anyone wants to have a shot feel free: https://github.com/vmenger/CoronaWatchNL/blob/master/Extract%20map%20info.ipynb

vmenger commented 4 years ago

Also: screw you, gemeente Heerenveen ;)

J535D165 commented 4 years ago

I will give it a try tonight.

This is an excellent coding interview question btw.

J535D165 commented 4 years ago

Thanks for your first draft. I made some changes to automate the reverse engineering.

map

The first results look promising. We need some more checks on whether the centroids are in the correct polygon. I made a correction for Heerenveen and Rotterdam. Any more? I will merge this into a dataset tomorrow.

[['Appingedam', 1, 25],
 ['Loppersum', 1, 25],
 ['Stadskanaal', 1, 25],
 ['Veendam', 1, 25],
 ['Achtkarspelen', 1, 25],
 ['Heerenveen', 1, 25],
 ['Leeuwarden', 1, 25],
 ['Ooststellingwerf', 1, 25],
 ['Smallingerland', 1, 25],
 ['Weststellingwerf', 1, 25],
 ['Assen', 1, 25],
 ['Emmen', 1, 25],
 ['Hoogeveen', 1, 25],
 ['Meppel', 1, 25],
 ['Losser', 1, 25],
 ['Noordoostpolder', 1, 25],
 ['Wageningen', 1, 25],
 ['Winterswijk', 1, 25],
 ['Renswoude', 1, 25],
 ['Den Helder', 1, 25],
 ['Texel', 1, 25],
 ['Hendrik-Ido-Ambacht', 1, 25],
 ['Lisse', 1, 25],
 ['Papendrecht', 1, 25],
 ['Zoetermeer', 1, 25],
 ['Hulst', 1, 25],
 ['Kapelle', 1, 25],
 ['Tytsjerksteradiel', 1, 25],
 ['Pekela', 1, 25],
 ['Aa en Hunze', 1, 25],
 ['Borger-Odoorn', 1, 25],
 ['Wijdemeren', 1, 25],
 ['Noordenveld', 1, 25],
 ['Midden-Drenthe', 1, 25],
 ['Midden-Delfland', 1, 25],
 ['Berkelland', 1, 25],
 ['Dantumadiel', 1, 25],
 ['Oldambt', 1, 25],
 ['Waadhoeke', 1, 25],
 ['Westerwolde', 1, 25],
 ['Midden-Groningen', 1, 25],
 ['Het Hogeland', 1, 25],
 ['Noardeast-Fryslân', 1, 25],
 ['Delfzijl', 25, 50],
 ['Groningen', 25, 50],
 ['Almere', 25, 50],
 ['Zeewolde', 25, 50],
 ['Harlingen', 25, 50],
 ['Opsterland', 25, 50],
 ['Almelo', 25, 50],
 ['Enschede', 25, 50],
 ['Haaksbergen', 25, 50],
 ['Hardenberg', 25, 50],
 ['Hellendoorn', 25, 50],
 ['Hengelo', 25, 50],
 ['Staphorst', 25, 50],
 ['Wierden', 25, 50],
 ['Aalten', 25, 50],
 ['Apeldoorn', 25, 50],
 ['Arnhem', 25, 50],
 ['Barneveld', 25, 50],
 ['Brummen', 25, 50],
 ['Buren', 25, 50],
 ['Doetinchem', 25, 50],
 ['Duiven', 25, 50],
 ['Lochem', 25, 50],
 ['Maasdriel', 25, 50],
 ['Putten', 25, 50],
 ['Scherpenzeel', 25, 50],
 ['Westervoort', 25, 50],
 ['Zutphen', 25, 50],
 ['Dronten', 25, 50],
 ['Amersfoort', 25, 50],
 ['Baarn', 25, 50],
 ['Bunschoten', 25, 50],
 ['Eemnes', 25, 50],
 ['Rhenen', 25, 50],
 ['Veenendaal', 25, 50],
 ['Wijk bij Duurstede', 25, 50],
 ['Alkmaar', 25, 50],
 ['Diemen', 25, 50],
 ['Edam-Volendam', 25, 50],
 ['Haarlemmermeer', 25, 50],
 ['Hilversum', 25, 50],
 ['Hoorn', 25, 50],
 ['Huizen', 25, 50],
 ['Medemblik', 25, 50],
 ['Opmeer', 25, 50],
 ['Schagen', 25, 50],
 ['Alblasserdam', 25, 50],
 ['Alphen aan den Rijn', 25, 50],
 ['Drechterland', 25, 50],
 ['Delft', 25, 50],
 ['Dordrecht', 25, 50],
 ["'s-Gravenhage", 25, 50],
 ['Hellevoetsluis', 25, 50],
 ['Hillegom', 25, 50],
 ['Katwijk', 25, 50],
 ['Leiden', 25, 50],
 ['Leiderdorp', 25, 50],
 ['Nieuwkoop', 25, 50],
 ['Noordwijk', 25, 50],
 ['Rijswijk', 25, 50],
 ['Schiedam', 25, 50],
 ['Sliedrecht', 25, 50],
 ['Voorschoten', 25, 50],
 ['Waddinxveen', 25, 50],
 ['Wassenaar', 25, 50],
 ['Zoeterwoude', 25, 50],
 ['Zwijndrecht', 25, 50],
 ['Borsele', 25, 50],
 ['Middelburg', 25, 50],
 ['Reimerswaal', 25, 50],
 ['Terneuzen', 25, 50],
 ['Veere', 25, 50],
 ['Vlissingen', 25, 50],
 ['Baarle-Nassau', 25, 50],
 ['Bergen op Zoom', 25, 50],
 ['Woensdrecht', 25, 50],
 ['Bergen (L.)', 25, 50],
 ['Lelystad', 25, 50],
 ['Oude IJsselstreek', 25, 50],
 ['Teylingen', 25, 50],
 ['Oost Gelre', 25, 50],
 ['Halderberge', 25, 50],
 ['Roosendaal', 25, 50],
 ['Schouwen-Duiveland', 25, 50],
 ['De Wolden', 25, 50],
 ['Noord-Beveland', 25, 50],
 ['Twenterand', 25, 50],
 ['Westerveld', 25, 50],
 ['Steenwijkerland', 25, 50],
 ['Moerdijk', 25, 50],
 ['Sluis', 25, 50],
 ['Bronckhorst', 25, 50],
 ['Kaag en Braassem', 25, 50],
 ['Zuidplas', 25, 50],
 ['Súdwest-Fryslân', 25, 50],
 ['Bodegraven-Reeuwijk', 25, 50],
 ['Leidschendam-Voorburg', 25, 50],
 ['Nissewaard', 25, 50],
 ['De Fryske Marren', 25, 50],
 ['Gooise Meren', 25, 50],
 ['Montferland', 25, 50],
 ['Westerkwartier', 25, 50],
 ['Ameland', 0, 0],
 ['Schiermonnikoog', 0, 0],
 ['Terschelling', 0, 0],
 ['Vlieland', 0, 0],
 ['Coevorden', 50, 100],
 ['Borne', 50, 100],
 ['Dalfsen', 50, 100],
 ['Deventer', 50, 100],
 ['Oldenzaal', 50, 100],
 ['Ommen', 50, 100],
 ['Raalte', 50, 100],
 ['Tubbergen', 50, 100],
 ['Urk', 50, 100],
 ['Zwolle', 50, 100],
 ['Culemborg', 50, 100],
 ['Doesburg', 50, 100],
 ['Ede', 50, 100],
 ['Elburg', 50, 100],
 ['Ermelo', 50, 100],
 ['Harderwijk', 50, 100],
 ['Nijkerk', 50, 100],
 ['Renkum', 50, 100],
 ['Rheden', 50, 100],
 ['Tiel', 50, 100],
 ['Voorst', 50, 100],
 ['Zaltbommel', 50, 100],
 ['Zevenaar', 50, 100],
 ['Bunnik', 50, 100],
 ['Leusden', 50, 100],
 ['Soest', 50, 100],
 ['Utrecht', 50, 100],
 ['Woudenberg', 50, 100],
 ['Zeist', 50, 100],
 ['Nieuwegein', 50, 100],
 ['Aalsmeer', 50, 100],
 ['Amstelveen', 50, 100],
 ['Amsterdam', 50, 100],
 ['Beemster', 50, 100],
 ['Bergen (NH.)', 50, 100],
 ['Beverwijk', 50, 100],
 ['Blaricum', 50, 100],
 ['Bloemendaal', 50, 100],
 ['Castricum', 50, 100],
 ['Haarlem', 50, 100],
 ['Heemskerk', 50, 100],
 ['Heerhugowaard', 50, 100],
 ['Heiloo', 50, 100],
 ['Landsmeer', 50, 100],
 ['Langedijk', 50, 100],
 ['Laren', 50, 100],
 ['Oostzaan', 50, 100],
 ['Ouder-Amstel', 50, 100],
 ['Purmerend', 50, 100],
 ['Uitgeest', 50, 100],
 ['Uithoorn', 50, 100],
 ['Velsen', 50, 100],
 ['Weesp', 50, 100],
 ['Zandvoort', 50, 100],
 ['Zaanstad', 50, 100],
 ['Barendrecht', 50, 100],
 ['Brielle', 50, 100],
 ['Capelle aan den IJssel', 50, 100],
 ['Gorinchem', 50, 100],
 ['Gouda', 50, 100],
 ['Hardinxveld-Giessendam', 50, 100],
 ['Stede Broec', 50, 100],
 ['Krimpen aan den IJssel', 50, 100],
 ['Maassluis', 50, 100],
 ['Oegstgeest', 50, 100],
 ['Ridderkerk', 50, 100],
 ['Rotterdam', 50, 100],
 ['Westvoorne', 50, 100],
 ['Vlaardingen', 50, 100],
 ['Woerden', 50, 100],
 ['Goes', 50, 100],
 ['West Maas en Waal', 50, 100],
 ['De Ronde Venen', 50, 100],
 ['Best', 50, 100],
 ['Boxtel', 50, 100],
 ['Dongen', 50, 100],
 ['Eindhoven', 50, 100],
 ['Geertruidenberg', 50, 100],
 ['Hilvarenbeek', 50, 100],
 ['Nuenen, Gerwen en Nederwetten', 50, 100],
 ['Oirschot', 50, 100],
 ['Oosterhout', 50, 100],
 ['Rucphen', 50, 100],
 ['Steenbergen', 50, 100],
 ['Waterland', 50, 100],
 ['Veldhoven', 50, 100],
 ['Waalre', 50, 100],
 ['Wormerland', 50, 100],
 ['Landgraaf', 50, 100],
 ['Kerkrade', 50, 100],
 ['Roermond', 50, 100],
 ['Vaals', 50, 100],
 ['Venlo', 50, 100],
 ['Venray', 50, 100],
 ['Weert', 50, 100],
 ['Valkenburg aan de Geul', 50, 100],
 ['Horst aan de Maas', 50, 100],
 ['Utrechtse Heuvelrug', 50, 100],
 ['Koggenland', 50, 100],
 ['Lansingerland', 50, 100],
 ['Maasgouw', 50, 100],
 ['Heeze-Leende', 50, 100],
 ['Reusel-De Mierden', 50, 100],
 ['Cuijk', 50, 100],
 ['Lingewaard', 50, 100],
 ['Bergeijk', 50, 100],
 ['Gulpen-Wittem', 50, 100],
 ['Tynaarlo', 50, 100],
 ['Hof van Twente', 50, 100],
 ['Rijssen-Holten', 50, 100],
 ['Geldrop-Mierlo', 50, 100],
 ['Olst-Wijhe', 50, 100],
 ['Westland', 50, 100],
 ['Stichtse Vecht', 50, 100],
 ['Hollands Kroon', 50, 100],
 ['Pijnacker-Nootdorp', 50, 100],
 ['Krimpenerwaard', 50, 100],
 ['Altena', 50, 100],
 ['West Betuwe', 50, 100],
 ['Vijfheerenlanden', 50, 100],
 ['Hoeksche Waard', 50, 100],
 ['Molenlanden', 50, 100],
 ['Kampen', 100, 200],
 ['Beuningen', 100, 200],
 ['Druten', 100, 200],
 ['Epe', 100, 200],
 ['Hattem', 100, 200],
 ['Heumen', 100, 200],
 ['Nijmegen', 100, 200],
 ['Oldebroek', 100, 200],
 ['Rozendaal', 100, 200],
 ['Wijchen', 100, 200],
 ['De Bilt', 100, 200],
 ['Houten', 100, 200],
 ['Lopik', 100, 200],
 ['Montfoort', 100, 200],
 ['IJsselstein', 100, 200],
 ['Enkhuizen', 100, 200],
 ['Heemstede', 100, 200],
 ['Albrandswaard', 100, 200],
 ['Tholen', 100, 200],
 ['Asten', 100, 200],
 ['Boxmeer', 100, 200],
 ['Breda', 100, 200],
 ['Deurne', 100, 200],
 ['Eersel', 100, 200],
 ['Etten-Leur', 100, 200],
 ['Gilze en Rijen', 100, 200],
 ['Goirle', 100, 200],
 ['Haaren', 100, 200],
 ['Helmond', 100, 200],
 ["'s-Hertogenbosch", 100, 200],
 ['Heusden', 100, 200],
 ['Loon op Zand', 100, 200],
 ['Mill en Sint Hubert', 100, 200],
 ['Oisterwijk', 100, 200],
 ['Oss', 100, 200],
 ['Sint-Michielsgestel', 100, 200],
 ['Someren', 100, 200],
 ['Son en Breugel', 100, 200],
 ['Tilburg', 100, 200],
 ['Valkenswaard', 100, 200],
 ['Vught', 100, 200],
 ['Waalwijk', 100, 200],
 ['Zundert', 100, 200],
 ['Beek', 100, 200],
 ['Beesel', 100, 200],
 ['Brunssum', 100, 200],
 ['Gennep', 100, 200],
 ['Heerlen', 100, 200],
 ['Maastricht', 100, 200],
 ['Meerssen', 100, 200],
 ['Mook en Middelaar', 100, 200],
 ['Nederweert', 100, 200],
 ['Simpelveld', 100, 200],
 ['Stein', 100, 200],
 ['Voerendaal', 100, 200],
 ['Leudal', 100, 200],
 ['Roerdalen', 100, 200],
 ['Echt-Susteren', 100, 200],
 ['Drimmelen', 100, 200],
 ['Bladel', 100, 200],
 ['Overbetuwe', 100, 200],
 ['Neder-Betuwe', 100, 200],
 ['Dinkelland', 100, 200],
 ['Sittard-Geleen', 100, 200],
 ['Beekdaelen', 100, 200],
 ['Heerde', 200, 300],
 ['Nunspeet', 200, 300],
 ['Oudewater', 200, 300],
 ['Grave', 200, 300],
 ['Laarbeek', 200, 300],
 ['Sint Anthonis', 200, 300],
 ['Cranendonck', 200, 300],
 ['Alphen-Chaam', 200, 300],
 ['Zwartewaterland', 200, 300],
 ['Eijsden-Margraten', 200, 300],
 ['Goeree-Overflakkee', 200, 300],
 ['Berg en Dal', 200, 300],
 ['Boekel', 300, 575],
 ['Uden', 300, 575],
 ['Gemert-Bakel', 300, 575],
 ['Landerd', 300, 575],
 ['Bernheze', 300, 575],
 ['Peel en Maas', 300, 575],
 ['Meierijstad', 300, 575]]
vmenger commented 4 years ago

@J535D165 Thanks for finishing this. Some other things got in between. I previously identified some strangely shaped municipalities but they all seem to match on your map. I'll include them below for completeness:

Heerenveen Berg en Dal Rotterdam Urk Bloemendaal Borne Twenterand Vlissingen Diemen Eersel IJsselstein