freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
529 stars 144 forks source link

Bulk data CSVs contain "" in columns that are declared `NOT NULL` #4244

Open rbpasker opened 1 month ago

rbpasker commented 1 month ago

these are the first two lines in https://storage.courtlistener.com/bulk-data/dockets-2024-05-06.csv.bz2

id,date_created,date_modified,source,appeal_from_str,assigned_to_str,referred_to_str,panel_str,date_last_index,date_cert_granted,date_cert_denied,date_argued,date_reargued,date_reargument_denied,date_filed,date_terminated,date_last_filing,case_name_short,case_name,case_name_full,slug,docket_number,docket_number_core,pacer_case_id,cause,nature_of_suit,jury_demand,jurisdiction_type,appellate_fee_status,appellate_case_type_information,mdl_status,filepath_local,filepath_ia,filepath_ia_json,ia_upload_failure_count,ia_needs_upload,ia_date_first_change,view_count,date_blocked,blocked,appeal_from_id,assigned_to_id,court_id,idb_data_id,originating_court_information_id,referred_to_id
"10838944","2019-01-21 09:17:15.707272+00","2022-02-04 21:50:01.416445+00","9","","Susan Illston","","","2021-01-21 17:13:19.363539+00",,,,,,"1998-09-03","1999-09-14","1999-09-14","","Advent Software Inc. v. Stratum Business","","advent-software-inc-v-stratum-business","3:98-cv-03398","9803398","119636","","840 Trademark","","Federal question","","","","","","",,"t","2021-01-21 17:13:19.309558+00","0",,"f",,"1588","cand","19579836",,

the 4th column appeal_from_str has "" as the value, but the DDL specifies: appeal_from_str text NOT NULL and the bulk load fails

rbpasker commented 1 month ago

its the "" between "9" and "Susan Illston"

mlissner commented 1 month ago

I think @quevon24 dealt with this. It's an ambiguity between null and "" that CSV's lack. I don't know his solution though.

rbpasker commented 1 month ago

COALESCE can be used used to export/import CSVs with NOT NULL columns properly

https://www.postgresql.org/docs/current/functions-conditional.html#FUNCTIONS-COALESCE-NVL-IFNULL

mlissner commented 1 month ago

Not a bad idea. If you wanted to provide a PR for that, we'd certainly welcome it. I guess we can just use the string NULL to represent null.