Add gaiadr3 catalog for generate_database

esa / tetra3

A fast lost-in-space plate solver for star trackers.

https://tetra3.readthedocs.io/en/latest/

Apache License 2.0

91 stars 22 forks source link

Add gaiadr3 catalog for generate_database #19

Open matwey opened 9 months ago

cgobat commented 9 months ago

This would be the only catalog to operate through a remote query, rather than using a local file. This difference may not be desirable, especially if it is not clear/well documented what selecting 'gaiadr3' will do. This also introduces a new dependency/requirement in astropy/astroquery, which probably warrants some discussion as well.

Do you know of anywhere one could obtain a downloadable "dump" of the relevant Gaia DR3 columns, so that the workflow can be made to match the other three catalogs?

matwey commented 9 months ago

Hi,

GAIA DR3 source catalog is available as 700GB of compressed CSV files at: https://cdn.gea.esac.esa.int/Gaia/gdr3/gaia_source/

However, I don't think that this using whole catalog would be useful because from 700GB of compressed CSV you obtain comparable amount of Tetra3 database. And this can be hardly loaded into RAM. It means, that you need to filter the source catalog either by magnitude or by coordinates and this is easier to do on the server side not downloading whole data.

However, introducing additional dependency can be an issue. So what is about introducing generate_database function variant with a plain numpy array as an catalog? It would allow user to provide [ra,dec,mag].

cgobat commented 9 months ago

Right. I definitely agree that it doesn't make sense to have to download hundreds of GB of data just to perform the filtering locally.

Here is a Gist where I show how to generate a Tetra3-style database from a FITS binary table (e.g., as downloaded from CDS/Vizier—see the ADQL query in the Gist). This could easily be adapted to the case where a NumPy array of RAs, Decs, and magnitudes is already available.

gustavmpettersson commented 9 months ago

Hi,

Gaia has the finest star databases yet (and will probably remain so for decades), so I agree with the concept of including it as an option. We should think about it's implications and implementation.

First of all, how complete is the catalogue, especially for bright stars? Right now I think we need six fields, (ra, dec, pm_ra, pm_dec, vis_mag, ID). The exact magnitude scale I don't this is important, what matters is the ordering, and if we leave it as unmodified magnitude for the given catalogue return_matches has more useful data. The tycho database has this issue that tons of bright stars are missing or don't have proper motions, leading to poor performance in wide fov applications.

We can lazily include extra dependencies for Gaia based database building, I don't think that's a problem. I also don't mind being able to put your own np array to build from, maybe that's something people would like to experiment with.

Please let me know your thoughts