frictionlessdata / pilot-dm4t

Pilot project with DM4T
http://www.cs.bath.ac.uk/dm4t/index.shtml
1 stars 1 forks source link

Attempt to create Data Package from ENLITEN SQL database #19

Closed danfowler closed 6 years ago

danfowler commented 7 years ago

Thanks to #1, I have the original SQL database for ENLITEN. One way to move forward would be to take the original SQL and create a Data Package using the jsontableschema-sql library.

# https://github.com/frictionlessdata/datapackage-py

from datapackage import pull_datapackage
from sqlalchemy import create_engine

engine = create_engine('mysql://root@localhost/enliten')

# Push
pull_datapackage(
    descriptor='descriptor_path',
    name="hi",
    backend='sql',
    engine=engine)

However, this results in:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-ab44223b964d> in <module>()
     11     name="hi",
     12     backend='sql',
---> 13     engine=engine)
     14 

/Users/dan/open_knowledge/frictionless_data/_envs/enliten/lib/python3.5/site-packages/datapackage/pushpull.py in pull_datapackage(descriptor, name, backend, **backend_options)
    103 
    104         # Prepare
--> 105         schema = storage.describe(table)
    106         base = os.path.dirname(descriptor)
    107         path, name = mappers.restore_path(table)

/Users/dan/open_knowledge/frictionless_data/_envs/enliten/lib/python3.5/site-packages/jsontableschema_sql/storage.py in describe(self, bucket, descriptor)
    172                 table = self.__get_table(bucket)
    173                 descriptor = mappers.columns_and_constraints_to_descriptor(
--> 174                     self.__prefix, table.name, table.columns, table.constraints)
    175 
    176         return descriptor

/Users/dan/open_knowledge/frictionless_data/_envs/enliten/lib/python3.5/site-packages/jsontableschema_sql/mappers.py in columns_and_constraints_to_descriptor(prefix, tablename, columns, constraints)
    133             message = 'Type "%s" of column "%s" is not supported'
    134             message = message % (column.type, column.name)
--> 135             raise TypeError(message)
    136         field = {'name': column.name, 'type': field_type}
    137         if not column.nullable:

TypeError: Type "BIT(1)" of column "default" is not supported

Column types that seem to be unsupported include:

cc: @roll

pwalsh commented 7 years ago

@danfowler is this now actionable?

danfowler commented 7 years ago

Will move forward with table by table approach offered by datapackage-pipelines.

pwalsh commented 6 years ago

So closing.