apache / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
https://gravitino.apache.org
Apache License 2.0
1.09k stars 343 forks source link

[Bug report] Confusion with hr schemas in playground setup #1096

Closed justinmclean closed 3 months ago

justinmclean commented 11 months ago

Describe what's wrong

If you connect to the local playground database with dbeaver (as the postgres user), you see only the public schema. You can create schemas via SQL including a hr schema.

However there seems to be a hidden hr schema, it doesn't show in Trino, but this query returns two rows: select * from "metalake_demo.catalog_pg1".hr.employees

The content of this is different from the user-created hr schema in Postgres which contains 100 rows.

The catalog looks correct and other information looks correct:

curl http://localhost:8090/api/metalakes/metalake_demo/catalogs/catalog_pg1       
{"code":0,"catalog":{"name":"catalog_pg1","type":"relational","provider":"jdbc-postgresql","comment":"comment","properties":{"jdbc-url":"jdbc:postgresql://postgresql/db","jdbc-user":"postgres","jdbc-password":"postgres","jdbc-database":"db","gravitino.bypass.driverClassName":"org.postgresql.Driver"},"audit":{"creator":"gravitino","createTime":"2023-12-12T03:56:13.300Z"}}}

curl http://localhost:8090/api/metalakes/metalake_demo/catalogs/catalog_pg1/schemas/hr
{"code":0,"schema":{"name":"hr","properties":{},"audit":{}}}%    

My guess is the two hr schemas are in different databases (db and postgres), however, only the postgres database is shown in dbeaver. Given the default database is usually called postgres it may be best to use that in the playground.

Error message and/or stacktrace

No errror messages.

How to reproduce

  1. Start up playground.
  2. Connect to the local host in dbeaver with a Postgres connection. Notice that only the postgres database is shown with a public schema and no tables.
  3. Create a schema called hr and tables with data in the public schema.
  4. Connect to the local host in dbeaver with a Trino connection. Notice no hr schema is shown.
  5. Run select * from "metalake_demo.catalog_pg1".hr.employees and notice two rows are returned.

Additional context

No response

qqqttt123 commented 11 months ago

In the playground, we already created a pg catalog. And the pg catalog has the hr schema.

jerryshao commented 11 months ago

Do we still need to improve this code? @qqqttt123

qqqttt123 commented 11 months ago

Do we still need to improve this code? @qqqttt123

Justin suggests us to use postgres as the name of default database. I use db now. For me, it's ok without modification. WDYT?

justinmclean commented 11 months ago

Most Postgres servers have three databases defined by default: template0 , template1 and postgres. Using a different one is one more thing that can go wrong when someone is trying to use Gravitrino for the first time. That it needs to be defined in two places when creating the catalogue further complicates this.

jerryshao commented 3 months ago

@qqqttt123 please check this issue.

qqqttt123 commented 3 months ago

I think that we don't need to fix this issue. We can close this.