HertieDataScience / SyllabusAndLectures

Hertie School of Governance Introduction to Collaborative Social Science Data Analysis
MIT License
37 stars 60 forks source link

Scraping a xmlns:py object #61

Open LisaKatharina opened 8 years ago

LisaKatharina commented 8 years ago

Dear everyone,

for one of our variables we would like to scrape data from a map (http://www.zeit.de/gesellschaft/zeitgeschehen/2015-08/fluechtlinge-verteilung-quote#comments, "Asylbewerber pro 1000 Einwohner"). When reading the source code, however, the only code (we think) describing the map looks like this:

 <div class="raw"><div xmlns:py="http://codespeak.net/lxml/objectify/pytype" data-show="lw" class="zg-fluechtlingskarte zon-grafik--map"></div>

When inspecting the object itself, however, we see all the information we need (rgb color code and county ID):

<path style="fill: rgb(153, 158, 166); opacity: 1;" d="M494.9686311787075,234.04792996794185L495.85266159695834,237.22432149242013L494.52661596958205,244.46098455987976L498.9467680608366,245.85665313270965L503.2785171102662,252.19701779778006L510.26235741444884,252.4505103171441L512.2072243346008,255.11161551014175L513.1796577946768,260.81050551289354L511.05798479087457,263.8479768469415L509.1131178707226,273.07868940413937L503.2785171102662,274.0895194373361L500.7148288973385,272.57321852896985L500.53802281368826,280.90873004007335L496.3830798479089,285.45110963114985L490.63688212927786,288.4776941917944L488.2500000000002,292.00702478757376L482.6806083650192,291.88100848292834L481.0009505703424,296.66800412825614L480.735741444867,303.7164529021011L478.7908745247149,306.73500633551157L470.03897338403056,305.35166712508635L466.6796577946769,303.21323183182085L456.95532319391657,296.66800412825614L451.2091254752853,295.53454332546335L450.32509505703445,292.63707164349944L444.40209125475303,287.3428812790198L444.22528517110277,285.32497298992257L447.14258555133097,283.81115253542976L451.03231939163516,276.23703890082106L458.7233840304183,272.0677104054457L462.6131178707227,267.5164599645859L466.7680608365022,269.03387905753607L470.39258555133085,267.2635241340322L468.8897338403042,262.96218674158536L465.97243346007633,263.341825088276L460.49144486692046,257.13843060250565L461.0218631178709,251.69000460999177L464.11596958174925,252.07026801963093L464.38117870722465,248.39350358603133L459.6958174904945,245.98351798444855L463.5855513307987,245.85665313270965L467.12167300380236,241.2879517093843L469.8621673003804,243.4457740950702L472.95627376425864,241.9226759835119L473.57509505703433,237.09729414786943L477.4648288973385,235.19160085533804L479.763307984791,240.1452996297612L483.12262357414465,241.2879517093843L484.36026615969604,236.7161979606699L487.1891634980991,238.8754623667228L492.3165399239547,235.57278199115262L492.1397338403044,231.50575414838386ZM483.6530418250952,237.98643603008622L483.6530418250952,237.98643603008622L483.6530418250952,237.98643603008622Z" id="01053" class="zg-map__area"></path> 

We suspect that it has something to do with python, but googleing the problem was not successful.

Ideas, anyone? Thanks for your help!!

christophergandrud commented 8 years ago

Hm, it looks like what is going on is that the map is being generated on the server from some XML object. The code next set of code you give is basically just the instructions for how to draw the map.

One approach might be to try to scrape the content of the tool-tips (the boxes that appear when you hover your mouse of the plot). Though, this might be difficult. As the tool-tip-content tag only includes information for the currently selected region. Hm.

This is going to be a tricky one to scrape.