Closed duttonw closed 6 months ago
aving chunk 64
10:57:08,871 INFO [86b1fa53-ffb9-4d52-9042-afb9b66ed3d6] Saving chunk 64
10:57:09,717 ERROR [ckanext.xloader.jobs] xloader error: list index out of range, Traceback (most recent call last):
File "/mnt/local_data/ckan_venv/src/ckanext-xloader/ckanext/xloader/loader.py", line 293, in load_csv
f)
psycopg2.errors.BadCopyFileFormat: extra data after last expected column
CONTEXT: COPY 5d1e8368-7ec3-435a-92d0-280ad1e3db0d, line 16306: "2023-05-14 12:10:00,129534,,,,,,,,,,,-20.169,,148.464,,,,0,,,,,,264.4,,,,,,3.187,,,,,,,,,,,,,,,,,,,,..."
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mnt/local_data/ckan_venv/src/ckanext-xloader/ckanext/xloader/jobs.py", line 222, in xloader_data_into_datastore_
direct_load()
File "/mnt/local_data/ckan_venv/src/ckanext-xloader/ckanext/xloader/jobs.py", line 173, in direct_load
logger=logger)
File "/mnt/local_data/ckan_venv/src/ckanext-xloader/ckanext/xloader/loader.py", line 301, in load_csv
' {}'.format(error_str))
ckanext.xloader.job_exceptions.LoaderError: Error during the load into PostgreSQL: extra data after last expected column
CONTEXT: COPY 5d1e8368-7ec3-435a-92d0-280ad1e3db0d, line 16306: "2023-05-14 12:10:00,129534,,,,,,,,,,,-20.169,,148.464,,,,0,,,,,,264.4,,,,,,3.187,,,,,,,,,,,,,,,,,,,,..."
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mnt/local_data/ckan_venv/src/ckanext-xloader/ckanext/xloader/jobs.py", line 80, in xloader_data_into_datastore
xloader_data_into_datastore_(input, job_dict)
File "/mnt/local_data/ckan_venv/src/ckanext-xloader/ckanext/xloader/jobs.py", line 226, in xloader_data_into_datastore_
tabulator_load()
File "/mnt/local_data/ckan_venv/src/ckanext-xloader/ckanext/xloader/jobs.py", line 195, in tabulator_load
logger=logger)
File "/mnt/local_data/ckan_venv/src/ckanext-xloader/ckanext/xloader/loader.py", line 454, in load_table
for i, records in enumerate(chunky(result, 250)):
File "/mnt/local_data/ckan_venv/src/ckanext-xloader/ckanext/xloader/loader.py", line 513, in chunky
item = list(itertools.islice(it, n))
File "/mnt/local_data/ckan_venv/src/ckanext-xloader/ckanext/xloader/loader.py", line 413, in row_iterator
data_row[headers[index]] = cell
IndexError: list index out of range
another blocking fast load:
'23505'}}
11:13:22,309 WARNI [2b90e4ed-21ff-4d12-85f8-a78da6e47e09] Load using COPY failed: Validation error when creating the database table: None - {'constraints': ['Cannot insert records or create indexbecause of uniqueness constraint'], 'info': {'orig': 'duplicate key value violates unique constraint "pg_type_typname_nsp_index"\nDETAIL: Key (typname, typnamespace)=(1dbae506-d73c-4c19-b727-e8654b8be95a__id_seq, 17092) already exists.\n', 'pgcode': '23505'}}
Trying again with tabulator
11:13:22,317 INFO [2b90e4ed-21ff-4d12-85f8-a78da6e47e09] Trying again with tabulator
Determining column names and types
11:13:22,327 INFO [2b90e4ed-21ff-4d12-85f8-a78da6e47e09] Determining column names and types
load_table: Decoded encoding: {'encoding': 'UTF-8-SIG', 'confidence': 1.0, 'language': ''}
@duttonw I'm not sure how feasible it is to handle columns with the wrong number of commas, but completely blank rows are simple enough. Tabulator has built-in functionality to let us skip them.
once https://github.com/qld-gov-au/ckanext-xloader/pull/90 reaches /ckan/ckanet-xloader this can be closed.
resolved in https://github.com/qld-gov-au/ckanext-xloader/pull/90, it will get to ckan org version in due time.
Example logs from importing 40mb csv file with 400,000+ rows.
https://www.data.qld.gov.au/dataset/5efaa096-4480-4540-88be-a10ababd9f49/resource/a14317b7-2fca-41b7-8294-9a1f7a085b0f