Closed camposandro closed 2 months ago
Has this been addressed (or a little bit improved) by the pyarrow dtype changes?
@delucchi-cmu yes, supporting None values by default using pyarrow should fix the column types. We're holding off on the merge of #271 this week but I might try to build some end-to-end tests in the meantime to make sure the output columns of the crossmatch indeed remain the same!
This has been addressed by recent changes to using pyarrow types, and holding on to the pyarrow schema throughout operations.
We should make sure the Dask DataFrame meta and the pyarrow schema are consistent whenever we address https://github.com/astronomy-commons/lsdb/issues/390.
Bug report
If we decide to keep the non-matches it's possible to get NaN values in our crossmatch dataframe. For every point in the left partitions we will have a row with the left point information and the information of the respective match on the right (which being inexistent will be set to NaN).
When assigning a row with NaN values on a dataframe, Pandas seems to automatically cast the whole column type to "float". Columns such as
Norder_{}_xmatch
,Dir_{}_xmatch
andNpix_{}_xmatch
, therefore have an incorrect type.We should create an end-to-end test to verify that the column data types of the original catalogs remain unchanged.
Before submitting Please check the following: