Open formwandler opened 10 years ago
CKAN does preview ODS files http://demo.ckan.org/dataset/ods-file-test/resource/9745133e-12c9-4fe6-a8cf-6fb3f1cf742c
If you can provide an example of a file that is not being handled correctly, please report it as a bug to the messytables project (which CKAN uses for spreadsheet file format handling) https://github.com/okfn/messytables/
Thanks for the info and test dataset. However, this does not work on my standard CKAN 2.2 site, see example. The official doc does only mention xls. Maybe the DataStore is needed (which I haven't installed)? And: How does messytables fit to that? So, even though there might exist some (more or less undocumented) "hacks" to show ODS, I think that a more official way would be very welcome.
Messytables is used for https://github.com/okfn/dataproxy which is the Google Appengine app that currently provides preview data for some file types (this is why it won’t work if your CKAN instance isn’t accessible via the web). Messytables has almost complete ODS support but is still missing a couple of fixes to make it work in more cases (for those that care it is missing an XML NS that is required). It isn't an ideal decoding because some ODS files, when unzipped are rather large and the available library has issues (see my bug report at https://joinup.ec.europa.eu/software/odfpy/issue/excessive-ram-usage-when-loading-2mb-ods-file), >1minute and 4Gb of RAM to preview is a tad on the useless side.
I think a better idea for the roadmap is to make all previews local, and not to use jsonpdataproxy any more. On data.gov.uk we wrote our own datapreview () which acts like jsonpdataproxy and checks the local archive before making a web request for the data. It also uses messytables (and suffers the current ODS bug).
Looks like jsonpdataproxy doesn't use the latest version of messytables with ODS support - https://github.com/okfn/dataproxy/tree/master/dataproxy/vendor
Perhaps it needs an update and redeploy?
Like formwandler when I uploaded an ODS file, intialliy the preview did not work. However if I edited the resource and went to the Datastore tab and manually triggered the upload to datastore, then when I viewed the dataset the preview worked. Thus I think the 'official' solution is simple, doesn't require a hack - Datastore needs to be installed, and in the config.ini file the datapusher configuration just needs to have ods added as a recognised file extension (to trigger the automatic loading to the datastore). We just worked out how to do this for xlsx and tsv files e.g. ckan.datapusher.formats = csv xls xlsx tsv application/csv application/vnd.ms-excel application/vnd.openxmlformats-officedocument.spreadsheetml.sheet application/vnd.ms-excel so to add ods I think would be ckan.datapusher.formats = csv xls xlsx tsv ods application/csv application/vnd.ms-excel application/vnd.openxmlformats-officedocument.spreadsheetml.sheet application/vnd.ms-excel application/vnd.oasis.opendocument.spreadsheet
When I get a moment I will test this on our installation and let you know if it worked.
I have had a bit more of a look at this. I tired uploading an ods file that I created (saved from excel) and then manually uploaded to the datastore and preiview worked. I then download the file that @formwandler linked too. Trying to manually push that file into the datastore failed. I then installed OpenOffice and opended forwwandlers file (it worked) and resaved it. I was then able to get it to go into the datastore and to preview, HOWEVER, there was an error in the data - in the ods file cells B3 and C3 had the same value in them (3), but the data previewed from the datastore upload omitted one of the 3s (cell C3). I tried changing column headings, inserting a column, no change, but if I changes the value in C3 to say 4, and reuploaded/put in datastore then it works correctly - see picture.
I will see if adding the handler for ods extension files to the datapusher config setting helps this problem.
Adding ckan.datapusher.formats = csv xls xlsx tsv ods application/csv application/vnd.ms-excel application/vnd.openxmlformats-officedocument.spreadsheetml.sheet application/vnd.oasis.opendocument.spreadsheet to the config file does get ods files being pulled into the datastore and preview working, however it does not resolve the identified issue around the adjacent cells with same value being wrongly imported and previewed. Downloading the file from CKAN does return the correct data, only its representation in the datastore is compromised (but obviously that is bad).
This is a bug in https://github.com/okfn/messytables/blob/master/messytables/ods.py#L115-L121 and is lack of handling for the number-columns-repeated
attribute (which specifies how many times to repeat the current table:table-cell if the value doesn't change in the following columns).
Beyond ODS, we may want to consider creating a ckanext-odf using https://github.com/kogmbh/WebODF to view ODT (text) and ODP (presentation) files too.
I know this is not directly related to pushing ODS data to the datastore but just wanted to let you know that with @rossjones' https://github.com/jqnatividad/ckanext-officedocs, we can now view Office documents (both MS and OpenOffice) online.
Since CKAN is for maintaining open data I am always wondering why it even supports Excel format (XLS) for previewing but not any Open Document Format like ODS.
So, why not adding ODS support for the embedded preview?