With tableschema-spss we run into a few things that don't allow to use it as a first-class citizien in tableschema/datapackage integrations:
more important that we should have special Table Schema descriptor containing spss:format properties. May be it's possible to infer this information somehow? For examples tableschema-pandas only creates an emtpy dataframe on storage.create and then on storage.write does real descriptor mapping using provided data (the same we should do here to get size of data for SPSS formats)
less important that bucket names include .sav extension. I think we still better to hide storage backend details and use abstract bucket names like bucket instead of bucket.sav (adding/removing an extension on the mapping step). But that's not vital.
So for now we can't use it in standard integration scenario with tableschema/datapackage like:
from tableschema import Table
table = Table('data.csv', schema='schema.csv')
table.save('data', storage='sql', engine=engine)
table.save('data', storage='bigquery', project=project, dataset=dataset)
table.save('data', storage='pandas')
# Will fail if schema doesn't have spss:format properties
# table.save('data', storage='spss', base_path='dir/path')
Not sure we have something to do with it now because it works great for other stuff (SPSS-base pilot etc). So just leaving it here for a future.
Our storage plugins for SQL/BigQuery/Pandas are able to pass this testsuite (with some differences in reflected schema - here just an example). It's updated to SPSS storage and use simple cast function to cast resource data:
Overview
With
tableschema-spss
we run into a few things that don't allow to use it as a first-class citizien intableschema/datapackage
integrations:spss:format
properties. May be it's possible to infer this information somehow? For examplestableschema-pandas
only creates an emtpy dataframe onstorage.create
and then onstorage.write
does real descriptor mapping using provided data (the same we should do here to get size of data for SPSS formats).sav
extension. I think we still better to hide storage backend details and use abstract bucket names likebucket
instead ofbucket.sav
(adding/removing an extension on the mapping step). But that's not vital.So for now we can't use it in standard integration scenario with
tableschema/datapackage
like:Not sure we have something to do with it now because it works great for other stuff (SPSS-base pilot etc). So just leaving it here for a future.
Our storage plugins for
SQL/BigQuery/Pandas
are able to pass this testsuite (with some differences in reflected schema - here just an example). It's updated toSPSS
storage and use simplecast
function to cast resource data: