Open jroigfer opened 9 years ago
Performing a set of simple tests: xls without format, with format, freezing header, saving csv in xls format, etc., all tests failed, but in the last test datapusher inserted the content of the file in the datastore. Reviewing differences in both files I saw that the problem was in the default new document option in M. Excel 2010, because ME2010 adds 3 tabs when creating new documents, and deleting the tabs without content solved the problem
Yes, I had the same problem. datastore seems to load the last sheet in the file. If there is content in the last sheet, it loads the headers from there. Ideally, it would load in the first sheet.
Could we just make the choice of sheet more customizable?
At the moment jobs.push_to_datastore
just does row_set = table_set.tables.pop()
.
We could:
change deployment/datapusher_settings.py
to include:
get_row_set = lambda table_set: table_set.tables.pop()
change jobs.push_to_datastore
so that it does:
get_row_set = web.app.config['GET_ROW_SET']
row_set = get_row_set(table_set)
Then anyone wanting to alter the logic for selecting a particular sheet will have an obvious extension point that wouldn't require forking the repo.
We would return a sheet called Data if it exists ,and the first sheet otherwise; so our datapusher_settings.py
would look like:
def get_row_set(table_set):
names = [r.name.lower() for r in table_set.tables]
try:
return table_set.tables[names.index('data')]
except ValueError:
return table_set.tables[0]
GET_ROW_SET = get_row_set
I can confirm that all tests pass with this approach.
I'm happy to create a PR if that's appropriate.
Even simpler, we change push_to_datastore
:
get_row_set = web.app.config.get('GET_ROW_SET', lambda table_set: table_set.tables.pop())
row_set = get_row_set(table_set)
That way existing deployments with custom settings files continue to work unaltered, but the extension point still exists.
In CKAN 2.4.1, install datapusher and csv file upload to datastore is correct, but when test upload xls or xlsx files fail when datapusher search headers, in datapusher.error.log: Fetching from: http://10.115.100.69:5000/dataset/b0c86ea3-d764-493b-b0e8-d4bb0c287474/resource/287e7041-2616-4223-a94b-ed23f2937f2b/download/testxls.xls [Tue Nov 17 18:38:38 2015] [error] Deleting "ead7be15-511f-4884-9f40-e12086d331b1" from datastore. [Tue Nov 17 18:38:38 2015] [error] Determined headers and types: [] [Tue Nov 17 18:38:38 2015] [error] Successfully pushed 0 entries to "ead7be15-511f-4884-9f40-e12086d331b1".
In configuration file: ckan.datapusher.formats = csv xls xlsx tsv application/csv application/vnd.ms-excel application/vnd.openxmlformats-officedocument.spr ckan.datapusher.url = http://0.0.0.0:8800/