bcicen / wikitables

Import tables from any Wikipedia article as a dataset in Python
MIT License
292 stars 34 forks source link

Tables with repeated rows are parsed incorrectly #13

Closed domschl closed 6 years ago

domschl commented 6 years ago

Example:

https://en.wikipedia.org/wiki/Greek_letters_used_in_mathematics,_science,_and_engineering

The Greek Letters table has five repeating groups of rows. Wikitable only gets the last group of row, so 4 row-groups are inaccessible.

k----n commented 6 years ago

These are specially formed tables and the schema is hard to determine.

I don't think there is any way currently to automatically extract specially formatted tables without writing specific code for it.

domschl commented 6 years ago

Ok, I've used Pandas for that table.