astronomy-commons / hipscat-import

HiPSCat import - generate HiPSCat-partitioned catalogs
https://hipscat-import.readthedocs.io
BSD 3-Clause "New" or "Revised" License
5 stars 2 forks source link

FitsReader fails when skip_column_names is not None #335

Closed hombit closed 1 month ago

hombit commented 1 month ago

Bug report

When working on masking incubator with Claudio, we found a bug in FitsReader.read() implementation. When FitsReader is constructed with column_names/skip_columns_names and used on the mapping stage with read_columns!=None, it does column filtering twice, which leads to an exception complaining about unknown columns

Pseudo-code:

args = ImportArguments(file_reader=FitsReader(skip_column_names=['bad_column']), ...)
pipeline_with_client(args, client)  # fails with "column "bad_column" does not exist"

Before submitting Please check the following: