VizierDB / web-api

Web Server backend that manages Viztrails and provides the API that is used by the Web UI
Apache License 2.0
1 stars 0 forks source link

Unicode issue when migrating data from Python to Mimir #7

Closed okennedy closed 6 years ago

okennedy commented 6 years ago

The following code is adding no data, just deleting irrelevant columns, but triggers a unicode error when exporting data back into Vizier/Mimir

# Get object for dataset with given name.
ds = vizierdb.get_dataset('shipments')

projection = [ "IMO_CODE", "PORT_OF_DEPARTURE", "DATE" ]

# Iterate over list of dataset columns and print column name
for col in ds.columns:
    print col.name
    if col.name not in projection:
        ds.delete_column(col.name)

# Update existing dataset with given name.
vizierdb.update_dataset('shipments', ds)

The script uses the following dataset: https://github.com/UBOdin/mimir/blob/master/test/data/cureSource.csv and produces the following error

UnicodeEncodeError:('ascii', u'P\ufffdGGSTALL, AUSTRIA', 1, 2, 'ordinal not in range(128)')
heikomuller commented 6 years ago

This should work now.

As a side note: the list of columns in variable ds changes as you delete columns. Thus, your code will likely create a dataset that contains more than the three columns in the projection list. Also, since the dataset has multiple columns with the same name ds.delete_column(col.name) will raise an exception at some point. An alternative is to use the column index when deleting the column:

Get object for dataset with given name.

ds = vizierdb.get_dataset('shipments')

projection = [ "IMO_CODE", "PORT_OF_DEPARTURE", "DATE" ]

Iterate over list of dataset columns and print column name

col_index = 0 for col in list(ds.columns): print col.name if col.name not in projection: ds.delete_column(col_index) else: col_index += 1

Update existing dataset with given name.

vizierdb.update_dataset('shipments', ds)

I also added a simple 'Filter Columns" operator to Vizual that could do the projection.