IDLabResearch / RMLProcessor

http://rml.io/
11 stars 10 forks source link

Support Excel (and optionally OpenOffice) file formats #11

Closed ktk closed 8 years ago

ktk commented 9 years ago

While it is bad practice the reality is that many of the tabular datasets available are still published as Excel files. Often this is the case because they have multiple sheets and they use Excel to curate those documents.

Some years ago I was using XLwrap. I can do the stuff I need in RML but I liked the fact that I could work directly on Excel files. It would be nice if this could be done in RMLProcessor as well.

I also liked the way that I could address cells and sheets directly. Keys is smarter but I have several files which do not have unique keys in columns so for that addressing a cell like B4 would be useful. Note that supporting sheets would be pretty much mandatory for this feature to be useful.

I think for MS-file formats the library to use would be Apache POI. Not sure about OpenOffice but on the other hand it's probably mainly about published Excels anyway.

ktk commented 8 years ago

At least implemented for Excel in https://github.com/RMLio/RML-Mapper, closing this issue.