droher / boxball

Prebuilt Docker images with Retrosheet's complete baseball history data for many analytical frameworks. Includes Postgres, cstore_fdw, MySQL, SQLite, Clickhouse, Drill, Parquet, and CSV.
Apache License 2.0
120 stars 16 forks source link

MySQL Container fails while loading data #65

Closed mattbbernstein closed 1 year ago

mattbbernstein commented 1 year ago

Describe the bug Error when loading the MySQL tag

To Reproduce Ran suggested MySQL start point docker run --name mysql -d -p 3306:3306 -v ~/boxball/mysql:/var/lib/mysql doublewick/boxball:mysql-latest

Specs Ubuntu through WSL2 on Windows 10,

Error Message 2022-11-23 17:18:07 ERROR 1262 (01000) at line 2063: Row 112998 was truncated; it contained more data than there were input columns

Looks to be when loading the Retrosheet Event csv file

Additional context Full container logs: https://pastebin.com/bD53UHXE

droher commented 1 year ago

Issue looks to be in the Retrosheet schedule table, with some improperly escaped strings in this file:

https://github.com/chadwickbureau/retrosheet/blob/master/schedule/2020REV.TXT#L148-L155

I'll patch my fork of it and send a note over to the mailing list. Should have a fix in the next couple days.

droher commented 1 year ago

@mattbbernstein I ended up putting a temporary patch in the mysql-latest image that removes the schedule table from the DB. Feel free to re-open the ticket if that doesn't work for you or you're still getting errors.