evidence-dev / evidence

Business intelligence as code: build fast, interactive data visualizations in pure SQL and markdown
https://evidence.dev
MIT License
4.39k stars 210 forks source link

Surface errors from CSV connector to sources output #2554

Open archiewood opened 1 month ago

archiewood commented 1 month ago

Background

CSV files are notoriously hard to parse. Evidence uses DuckDB which is very good, but often fails without configuration.

For example, a failure may look like this

npm run sources

> my-evidence-project@0.0.1 sources
> evidence sources

✔ Loading plugins & sources
-----
  [Processing] cdc
  deaths ✔ Finished, wrote 0 rows.

However, this is not easy to debug. If you drop into duckdb CLI and try from 'deaths.csv' you get a much more helpful, verbose output.

$ from 'deaths.csv';

Conversion Error: CSV Error on Line: 24473
Original Line: LA,2022,November,12 month-ending,Percent with drugs specified,68.9328389,99.5+,0.020997175,Louisiana,Numbers may differ from published reports using final data. See Technical Notes.,**,
Error when converting column "Percent Complete". Could not convert string "99.5+" to 'BIGINT'

Column Percent Complete is being converted as type BIGINT
This type was auto-detected from the CSV file.
Possible solutions:
* Override the type for this column manually by setting the type explicitly, e.g. types={'Percent Complete': 'VARCHAR'}
* Set the sample size to a larger value to enable the auto-detection to scan more values, e.g. sample_size=-1
* Use a COPY statement to automatically derive types from an existing table.

Solution

This Error message should be surfaced to the user

archiewood commented 1 month ago

It may be helpful to surface errors from other connectors. I am unsure about this