Open matwey opened 9 months ago
Hi,
GAIA DR3 source catalog is available as 700GB of compressed CSV files at: https://cdn.gea.esac.esa.int/Gaia/gdr3/gaia_source/
However, I don't think that this using whole catalog would be useful because from 700GB of compressed CSV you obtain comparable amount of Tetra3 database. And this can be hardly loaded into RAM. It means, that you need to filter the source catalog either by magnitude or by coordinates and this is easier to do on the server side not downloading whole data.
However, introducing additional dependency can be an issue. So what is about introducing generate_database
function variant with a plain numpy array as an catalog? It would allow user to provide [ra,dec,mag]
.
Right. I definitely agree that it doesn't make sense to have to download hundreds of GB of data just to perform the filtering locally.
Here is a Gist where I show how to generate a Tetra3-style database from a FITS binary table (e.g., as downloaded from CDS/Vizier—see the ADQL query in the Gist). This could easily be adapted to the case where a NumPy array of RAs, Decs, and magnitudes is already available.
Hi,
Gaia has the finest star databases yet (and will probably remain so for decades), so I agree with the concept of including it as an option. We should think about it's implications and implementation.
First of all, how complete is the catalogue, especially for bright stars? Right now I think we need six fields, (ra, dec, pm_ra, pm_dec, vis_mag, ID). The exact magnitude scale I don't this is important, what matters is the ordering, and if we leave it as unmodified magnitude for the given catalogue return_matches has more useful data. The tycho database has this issue that tons of bright stars are missing or don't have proper motions, leading to poor performance in wide fov applications.
We can lazily include extra dependencies for Gaia based database building, I don't think that's a problem. I also don't mind being able to put your own np array to build from, maybe that's something people would like to experiment with.
Please let me know your thoughts
This would be the only catalog to operate through a remote query, rather than using a local file. This difference may not be desirable, especially if it is not clear/well documented what selecting 'gaiadr3' will do. This also introduces a new dependency/requirement in
astropy
/astroquery
, which probably warrants some discussion as well.Do you know of anywhere one could obtain a downloadable "dump" of the relevant Gaia DR3 columns, so that the workflow can be made to match the other three catalogs?