ets / tap-spreadsheets-anywhere

GNU Affero General Public License v3.0
31 stars 63 forks source link

Header row being skipped in CSV reading over FTPS but not with FILE #88

Open cwilson6163 opened 1 month ago

cwilson6163 commented 1 month ago

I have been getting an error using tap-spreadsheets-anywhere in conjunction with target-oracle: IndexError: string index out of range cmd_type=elb consumer=True job_name=prod:tap-spreadsheets-anywhere-to-target-oracle name=target-oracle producer=False run_id=7d04cedc-ba51-4a32-ad3d-fb5457809ed4 stdio=stderr string_id=target-oracle

What seems to be happening is that when the csv file is read from Box (over ftps) tap-spreadsheets-anywhere ignores the header row and uses the first data row as the header so the schema has properties with key values of the first data row. If I go into the Box folder and modify the file to repeat the header row, then everything functions properly. I do not have the skip_initial set (but have tried with explicitly setting it to 0).

The header row is not skipped over when I have the path set using a local file (FILE://...) only when I use FTP.

cwilson6163 commented 1 month ago

This seems to be fixed by just including field_names, but I am still wondering why the header row would be skipped and row 2 would be considered the header.