Add "dataset" subdirectory for all parquet files

Feature request

Request

Rename ancillary files like "cataloginfo.json" to start with "" so they will be ignored by default.

Goal

Allow users to make a simple call to pandas.read_parquet (or other standard python parquet readers) without having to specify the ignore_prefixes keyword argument.

Details

Currently, the simplest call that works seems to be:

import pandas as pd

# assuming we're in the hipscat-import root directory
small_sky_object_catalog = "tests/hipscat_import/data/small_sky_object_catalog"

pd.read_parquet(
    small_sky_object_catalog,
    partitioning=None,  # see issue #367 for why this is necessary
    ignore_prefixes=[
        ".",
        "_",
        "catalog_info.json",
        "partition_info.csv",
        "point_map.fits",
        "provenance_info.json",
    ],
)

It's cumbersome to have to specify the ignore_prefixes kwarg every time, but without it that call throws the error:

ArrowInvalid: Could not open Parquet input source 'tests/hipscat_import/data/small_sky_object_catalog/partition_info.csv': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.

Filenames that start with "." or "_" are ignored by default, so renaming the ancillary files to start with "_" would allow the user to skip the ignore_prefixes kwarg.

Before submitting Please check the following:

[x] I have described the purpose of the suggested change, specifying what I need the enhancement to accomplish, i.e. what problem it solves.
[x] I have included any relevant links, screenshots, environment information, and data relevant to implementing the requested feature, as well as pseudocode for how I want to access the new functionality.
[x] If I have ideas for how the new feature could be implemented, I have provided explanations and/or pseudocode and/or task lists for the steps.

astronomy-commons / hats