bcicen / wikitables

Import tables from any Wikipedia article as a dataset in Python
MIT License
292 stars 34 forks source link

[WIP] Properly recognise tables that use the change template #24

Closed k-nut closed 5 years ago

k-nut commented 5 years ago

The table of the largest cities of Brazil uses the change template to calculate the change between two values. Due to this template being used, the columns are not all recognised properly as can be seen in the test.

I don't know how to solve this yet but thought this would probably be the best way to document it.

k-nut commented 5 years ago

Do you think that this can even be solved or is it not worth looking at?

bcicen commented 5 years ago

It is certainly possible with the right template handling -- I've pushed change template handling here, though the header/key fields for each row are still mis-aligned

bcicen commented 5 years ago

The alignment issue was due to the change template producing three fields, while the field parser was only written to expect one.

I've merged support for multi-field-generating templates and the change template in https://github.com/bcicen/wikitables/pull/25, available in the latest release.

k-nut commented 5 years ago

Nice. Thanks a lot! When I try to run this on the Brazil example form above I do get a lot of dropping field from unknown column: warnings now (without there actually being a column in that output) but it seems to work just fine. 🚀

k-nut commented 5 years ago

Just to finish this up I rebased my branch with the new test and pushed it. Let me know in case you'd still like to merge this. In the Travis build you can also see the dropping field from unknown column output.

bcicen commented 5 years ago

@k-nut I did not realize you updated the tests; I would be happy to include them. Can you open a new PR for these changes? GH isn't allowing me to merge this one.