astronomy-commons / hipscat-import

HiPSCat import - generate HiPSCat-partitioned catalogs
https://hipscat-import.readthedocs.io
BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

Different input column ordering #239

Open delucchi-cmu opened 6 months ago

delucchi-cmu commented 6 months ago

Bug report

From slack thread report by Julia Gschwend:

... the DP0.2 files we downloaded have the columns in a different order (the indexes are inconsistent) and it is causing the code to crash.

in pyarrow._parquet.FileMetaData.append_row_groups
RuntimeError: AppendRowGroups requires equal schemas.
The two columns with index 1 differ.
column descriptor = {
  name: xy_flag,
  path: xy_flag,
  physical_type: BOOLEAN,
  converted_type: NONE,
  logical_type: None,
  max_definition_level: 1,
  max_repetition_level: 0,
}
column descriptor = {
  name: coord_dec,
  path: coord_dec,
  physical_type: DOUBLE,
  converted_type: NONE,
  logical_type: None,
  max_definition_level: 1,
  max_repetition_level: 0,
}

We should gracefully handle this case, forcing the same column order when writing the files.

Before submitting Please check the following: