astronomy-commons / hipscat-import

HiPSCat import - generate HiPSCat-partitioned catalogs
https://hipscat-import.readthedocs.io
BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

Add public catalog import tips for gaia #192

Closed delucchi-cmu closed 5 months ago

delucchi-cmu commented 9 months ago

We have dedicated pages for several public catalogs, and some guides around pain points of importing them (https://hipscat-import.readthedocs.io/en/latest/catalogs/public/index.html). Add a page just for gaia.

Original CSV files: http://cdn.gea.esac.esa.int/Gaia/gdr3/gaia_source/

The first 1000 lines are wonky headers that should be ignored on read.

Potentially we might not handle boolean values well on import? Check this out?

Arguments will likely be something like:

from hipscat_import.catalog.file_readers import CsvReader

args = ImportArguments(
    output_catalog_name="gaia_csv_test",
    input_path=source_directory,
    file_reader=CsvReader(skiprows=1000)
    input_format="csv.gz",
    ra_column="ra",
    dec_column="dec",
    id_column="source_id",
    output_path=output_path,
    dask_tmp="/epyc/users/ncaplar/", 
    pixel_threshold=500_000,
    highest_healpix_order=6,
    overwrite=True)
hombit commented 5 months ago

We can point to these notebooks in LSDB docs: https://github.com/astronomy-commons/lsdb/blob/main/docs/notebooks/des-gaia.ipynb https://github.com/astronomy-commons/lsdb/blob/main/docs/notebooks/ztf_bts-ngc.ipynb