frictionlessdata / frictionless-py

Data management framework for Python that provides functionality to describe, extract, validate, and transform tabular data
https://framework.frictionlessdata.io
MIT License
709 stars 148 forks source link

File incorrectly parsed #114

Closed pwalsh closed 7 years ago

pwalsh commented 8 years ago

A CSV file that was flagged for having defective rows because some cells consist of strings which themselves contain commas.

http://goodtables.okfnlabs.org/reports?data_url=https://data.wprdc.org/dataset/33891263-235f-4c6c-af8a-c79b0114a9f9/resource/c29c46fe-5bf7-431a-9c1c-a9e2e4e81eff/download/ucsur-soa-data-for-distribution---2015-oct-14.csv&format=csv&encoding=&schema_url=

pwalsh commented 7 years ago

@roll if this acts correctly in v1 I am happy to close this.

roll commented 7 years ago

With this PR https://github.com/frictionlessdata/tabulator-py/pull/127 it's OK (off-topic the file is cool).

$ goodtables table https://data.wprdc.org/dataset/33891263-235f-4c6c-af
8a-c79b0114a9f9/resource/c29c46fe-5bf7-431a-9c1c-a9e2e4e81eff/download/ucsur-soa-data-for-distribution---2015-oct-14.csv
DATASET
=======
{'error-count': 0, 'table-count': 1, 'time': 6.388, 'valid': True}

TABLE [1]
=========
{'error-count': 0,
 'headers': ['id',
             'happy1',
             'happy2',
             'work1',
             'work2',
             'work3',
             'work4',
             'work6',
             'work7',
             'work5',
             'work8_1',
             'work8_2',
             'work8_3',
             'work8_4',
             'work8_5',
             'work8_6',
             'work8_7',
             'work8open',
             'work8_8',
             'work8_9',
             'work8_10',
             'work9',
             'work10',
             'work11',
             'work12',
             'work13',
             'work14',
             'work15',
             'conf1',
             'conf2',
             'conf3',
             'conf4',
             'conf5',
             'conf6',
             'conf7open',
             'conf8',
             'conf9',
             'conf10',
             'conf11',
             'inc1',
             'inc2',
             'inc3',
             'inc4',
             'inc5',
             'inc6',
             'inc7',
             'inc8',
             'inc9',
             'inc10',
             'inc11',
             'inc12',
             'inc13',
             'inc14',
             'inc15',
             'inc16',
             'inc17',
             'inc18',
             'inc19',
             'inc20',
             'liv1',
             'liv1open',
             'liv2',
             'liv3',
             'liv4_1',
             'liv4_2',
             'liv4_3',
             'liv4_4',
             'liv4_5',
             'liv4_6',
             'liv4_7',
             'liv4_8',
             'liv5',
             'liv6',
             'liv7',
             'liv7open',
             'liv8',
             'liv8open',
             'liv8a',
             'liv9',
             'liv9a',
             'liv9b',
             'liv10',
             'liv11',
             'liv12',
             'liv13open',
             'liv14',
             'liv14open',
             'liv14a',
             'liv15',
             'liv15open',
             'liv15a',
             'liv16_1',
             'liv16_2',
             'liv16_3',
             'liv16_4',
             'liv16_5',
             'liv16_6',
             'liv16_7',
             'liv16_8',
             'liv16open',
             'liv16_9',
             'liv16_10',
             'liv17',
             'liv18open',
             'liv19',
             'liv20',
             'liv21',
             'liv22',
             'liv23',
             'liv23open',
             'liv24_1',
             'liv24_2',
             'liv24_3',
             'liv24_4',
             'liv24_5',
             'liv24_6',
             'liv24_7',
             'liv24_8',
             'liv24open',
             'liv24_9',
             'liv24_10',
             'liv26',
             'liv26a',
             'liv27',
             'neigh1',
             'neigh2',
             'neigh3',
             'neigh4',
             'neigh5',
             'neigh6',
             'neigh7',
             'neigh8',
             'neigh9',
             'neigh10',
             'neigh11',
             'neigh12',
             'neigh13',
             'neigh14',
             'trans1',
             'trans1a',
             'trans2_1',
             'trans2_2',
             'trans2_3',
             'trans2_4',
             'trans2_5',
             'trans2_6',
             'trans2_7',
             'trans2_8',
             'trans2open',
             'trans2_9',
             'trans210',
             'trans3',
             'trans4',
             'trans5open',
             'trans6',
             'trans7',
             'health1',
             'health2',
             'health3',
             'health4',
             'health5',
             'health6',
             'health7',
             'health8',
             'health9',
             'health10',
             'health11',
             'health12',
             'health13',
             'health14',
             'health15',
             'health16',
             'health17',
             'health18open',
             'health19',
             'fs1',
             'fs2_1',
             'fs2_2',
             'fs2_3',
             'fs2_4',
             'fs2_5',
             'fs2_6',
             'fs2_7',
             'fs3',
             'fs4',
             'fs5_1',
             'fs5_2',
             'fs5_3',
             'fs5_4',
             'fs5_5',
             'fs5_6',
             'fs5_7',
             'fs6',
             'fs7',
             'fs8',
             'fs9',
             'fs10',
             'fs11',
             'beh1',
             'beh2',
             'beh3',
             'beh4',
             'beh4open',
             'beh5',
             'beh6',
             'beh7',
             'beh7open',
             'beh8',
             'beh9',
             'beh10',
             'beh11',
             'beh11open',
             'beh12',
             'beh13',
             'beh14',
             'beh15',
             'beh16',
             'beh17',
             'beh18',
             'beh19a',
             'beh19b',
             'beh20',
             'bmi',
             'beh21',
             'beh22',
             'beh23',
             'cog1',
             'cog2',
             'cog3',
             'anxdep1',
             'anxdep2',
             'anxdep3',
             'anxdep4',
             'anxdep5',
             'anxdep6',
             'anxdep7',
             'anxdep8',
             'anxdep9',
             'anxdep10',
             'socsup1',
             'socsup2',
             'socsup3',
             'socsup4',
             'socsup5',
             'socsup6',
             'socsup7',
             'socsup8',
             'socsup9',
             'socsup10',
             'socsup11',
             'socsup12',
             'cg1',
             'cg2',
             'cg2open',
             'cg3',
             'cg4',
             'cg5',
             'cg6',
             'cg7',
             'cg7open',
             'cg8',
             'cg9',
             'cg10',
             'cg11_1',
             'cg11_2',
             'cg11_3',
             'cg11_4',
             'cg11_5',
             'cg11_6',
             'cg11_7',
             'cg11_8',
             'cg11_9',
             'cg11_10',
             'cg11_11',
             'cg11_12',
             'cg11_13',
             'cg11_14',
             'cg11_15',
             'cg11_16',
             'cg11_17',
             'cg11open',
             'cg11_19',
             'cg11_20',
             'cg11_21',
             'cg12Open',
             'cg13',
             'vol1',
             'vol2_1',
             'vol2_2',
             'vol2_3',
             'vol2_4',
             'vol2_5',
             'vol2_6',
             'vol2_7',
             'vol2_8',
             'vol2open',
             'vol2_9',
             'vol2_10',
             'vol3',
             'vol4',
             'vol5',
             'vol6a',
             'vol6b',
             'vol6c',
             'vol6d',
             'vol6e',
             'vol6f',
             'vol6g',
             'vol6h',
             'vol6i',
             'vol6j',
             'vol6k',
             'vol6l',
             'vol7open',
             'vol8a',
             'vol8b',
             'vol8c',
             'vol8d',
             'vol8e',
             'vol8f',
             'vol8g',
             'vol8h',
             'vol8i',
             'vol8j',
             'vol8k',
             'vol8l',
             'vol9',
             'vol10open',
             'vol11',
             'serv1',
             'serv2',
             'serv3',
             'serv4',
             'serv5',
             'serv6',
             'serv7',
             'serv8',
             'serv9',
             'serv10',
             'serv11',
             'serv11a',
             'serv12',
             'serv13',
             'serv14open',
             'serv15',
             'relig1',
             'relig1open',
             'relig2',
             'relig3',
             'net1',
             'net2',
             'net3',
             'net4',
             'sex',
             'age',
             'age65',
             'marstat',
             'boomer',
             'hisp',
             'race_1',
             'race_2',
             'race_3',
             'race_4',
             'race_5',
             'race_6',
             'race_7',
             'educ',
             'income',
             'vet',
             'PCS8',
             'MCS8',
             'PHQ8',
             'PHQ2',
             'confidence',
             'tractce10',
             'geoid10',
             'namelsad10',
             'zipcodeOpen',
             'phoneType',
             'stratum',
             'finalweight'],
 'row-count': 1000,
 'time': 6.365,
 'valid': True}