dijs / infobox-parser

Parse Wikipedia Infoboxes
40 stars 18 forks source link

Add dublin infobox data #2

Closed goferito closed 7 years ago

goferito commented 7 years ago

This PR is not really meant to be accepted.

What I need is to get the GDP from the cities, so I just took the infobox text from the Dublin page. But now I have no clue how to proceed. I don't really understand how the format of the infobox works.

Are you planning to parse stuff like this? Or could you point me a bit in the right direction so I can help?

In cities like Berlin, it parses it well (but Berlin is the only city I found where it succeeds).

dijs commented 7 years ago

Yes, absolutely I would like to parse things like this.

Here is the GDP data in "raw" wikitext format:

| blank1_name_sec2        = GDP per capita
| blank1_info_sec2        = US$ 51,319<ref name="brookingsgdp" />

What I can most likely do, is parse these "blank" key/value pairs into the result properties.

Since they are technically two separate props in the raw info, this may be tricky. But wikitext always is!

I will take a look soon. Thanks for your interest.

goferito commented 7 years ago

Is there any specification of the format somewhere? I can't find it. How the hell does wikipedia to display this things properly on the frontend?

dijs commented 7 years ago

I tried to look for some in the past. Was not successful... It seems like wikitext does not always follow the rules anyways though... I think there exists wikitext to HTML converters, but that does not help me. I want the data!

dijs commented 7 years ago

I am going to merge this and start working on the feature and tests for the blank data stuff.

dijs commented 7 years ago

You should be good to go now.

Check it out: https://github.com/dijs/infobox-parser/blob/master/test/dublin-spec.js