astro-datalab / notebooks-latest

Default set of Data Lab notebooks, by DL team and contributed by users
BSD 3-Clause "New" or "Revised" License
57 stars 48 forks source link

Reconsider use of tbl_stat in notebooks. #5

Open weaverba137 opened 5 years ago

weaverba137 commented 5 years ago

The tbl_stat table is meant for internal operations, but several references to it exist in these notebooks:

The table does not appear to be updated automatically, and until very recently, was not installed at all on gp04. Furthermore, the row counts are approximations. There are probably better and more "official" ways to get table sizes from pg_catalog. Additional metadata would be available in the tap_schema tables, e.g. the list of available schemas.

weaverba137 commented 1 year ago

@jacquesalice, this very old ticket was was ultimately based on the idea that the tbl_stat table was not being updated regularly. Do you know if that is now updated routinely?

jacquesalice commented 1 year ago

Hi @weaverba137 it does look like tbl_stat is being regularly updated, but it doesn't look like it properly stores the row count (nrows) for Views. So for a View like sdss_dr17.specobj, it says nrows=0. Example:

query = "SELECT nrows FROM tbl_stat WHERE schema='sdss_dr17' and tbl_name='specobj'"
response = qc.query(sql=query)
print(response)
--------
nrows
0

I'm not sure if this is something @mjfitzpatrick could easily implement, or if we should just add a note in the notebooks saying that tbl_stat cannot be used to obtain information about tables stored as Views.

mjfitzpatrick commented 1 year ago

The row counts in tbl_stat are only approximations from one of the postgres system tables, that approximation cannot be used for either views or FDW tables and so values are reported as zero. To get a full set of information for a table, the stat would need to do a full "select count(*)...." on each view, or execute on the remote host of an FDW and be merged back in on the production db.

The tbl_stat was initially an ad hoc tool when the system was first built. Since then it's been used in various places despite never being deemed part of the production system. As such, updates are sporadic, backups are coincidental and work to fill in the view/fdw issues has never really come up.

On Wed, Aug 23, 2023 at 4:00 PM Alice Jacques @.***> wrote:

Hi @weaverba137 https://github.com/weaverba137 it does look like tbl_stat is being regularly updated, but it doesn't look like it properly stores the row count (nrows) for Views. So for a View like sdss_dr17.specobj, it says nrows=0. Example:

query = "SELECT nrows FROM tbl_stat WHERE schema='sdss_dr17' and tbl_name='specobj'" response = qc.query(sql=query) print(response)

nrows 0

I'm not sure if this is something @mjfitzpatrick https://github.com/mjfitzpatrick could easily implement, or if we should just add a note in the notebooks saying that tbl_stat cannot be used to obtain information about tables stored as Views.

— Reply to this email directly, view it on GitHub https://github.com/astro-datalab/notebooks-latest/issues/5#issuecomment-1690752194, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABP7I6YFMUQP74EZNRW763DXW2DQHANCNFSM4HJQ6XXA . You are receiving this because you were mentioned.Message ID: @.***>