apache / polaris

Apache Polaris, the interoperable, open source catalog for Apache Iceberg
https://polaris.apache.org/
Apache License 2.0
1.13k stars 124 forks source link

[FEATURE REQUEST] Add the doc on how to configure Postgres as the backend database #230

Closed aihuaxu closed 1 month ago

aihuaxu commented 2 months ago

Is your feature request related to a problem? Please describe.

Currently we don't have instruction on how to configure Postgres right now.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

loicalleyne commented 2 months ago

Slight modification of https://github.com/apache/polaris/blob/main/extension/persistence/eclipselink/src/main/resources/META-INF/persistence.xml

<?xml version="1.0" encoding="UTF-8"?>
<persistence version="2.0" xmlns="http://java.sun.com/xml/ns/persistence"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd">

  <persistence-unit name="polaris" transaction-type="RESOURCE_LOCAL">
    <provider>org.eclipse.persistence.jpa.PersistenceProvider</provider>
    <class>org.apache.polaris.core.persistence.models.ModelEntity</class>
    <class>org.apache.polaris.core.persistence.models.ModelEntityActive</class>
    <class>org.apache.polaris.core.persistence.models.ModelEntityChangeTracking</class>
    <class>org.apache.polaris.core.persistence.models.ModelEntityDropped</class>
    <class>org.apache.polaris.core.persistence.models.ModelGrantRecord</class>
    <class>org.apache.polaris.core.persistence.models.ModelPrincipalSecrets</class>
    <class>org.apache.polaris.core.persistence.models.ModelSequenceId</class>
    <shared-cache-mode>NONE</shared-cache-mode>
    <properties>
      <property name="jakarta.persistence.jdbc.url"
        value="jdbc:postgres://enter_your_db_host_here:5432/{realm}"/>
      <property name="jakarta.persistence.jdbc.user" value="enter_your_postgres_user"/>
      <property name="jakarta.persistence.jdbc.password" value="enter_your_postgres_user_password"/>
      <property name="jakarta.persistence.schema-generation.database.action" value="create"/>
      <property name="eclipselink.logging.level.sql" value="FINE"/>
      <property name="eclipselink.logging.parameters" value="true"/>
      <property name="eclipselink.persistence-context.flush-mode" value="auto"/>
    </properties>
  </persistence-unit>
</persistence>

Change the following in polaris-server.yml

metaStoreManager:
  # type: in-memory
  type: eclipse-link # uncomment to use eclipse-link as metastore
  persistence-unit: polaris

Add the following to polaris\extension\persistence\eclipselink\build.gradle.kts under dependencies:

implementation("org.postgresql:postgresql:42.7.4")

Build with ECLIPSELINK=true (I changed this in the Dockerfile but there may be a better way)

Remember to run the bootstrap after to create the required tables in the persistence store java -jar /path/to/jar/polaris-service-all.jar bootstrap polaris-server.yml

Afterwards, Polaris can be launched normally: java -jar /path/to/jar/polaris-service-all.jar server polaris-server.yml

MonkeyCanCode commented 2 months ago

@aihuaxu @loicalleyne so by default, eclipselink is using 32 max connections. From the testing I done local, the min/init/max connection settings seems to be not working (it will set to max and not scale down as connection sitting idles for long time). I am using following sample settings:

<property name="eclipselink.connection-pool.default.initial" value="1"/>
<property name="eclipselink.connection-pool.default.min" value="10"/>
<property name="eclipselink.connection-pool.default.max" value="64"/>

I am not sure if that is bug with this version of eclipselink or not. As your guys are adding this example for support, can we check above as well?

Also, psql driver jar is not part of eclipselink extension, we may want to add a note for this as well. Even better, what if we add support for allow people to load the jars they want dynamically via argument during build? E.g.

if (project.properties.get("additionalDependencies")) {
  dependencies { implementation(project(project.properties.get("additionalDependencies"))) }
}

then we can do following:

./gradlew --no-daemon --info -PeclipseLink=$ECLIPSELINK -PadditionalDependencies=org.postgresql:postgresql:42.7.4 clean shadowJar startScripts
flyrain commented 2 months ago

related slack thread: https://apache-iceberg.slack.com/archives/C07HADW86HZ/p1724965902423649

aihuaxu commented 2 months ago

@MonkeyCanCode Thanks for the detailed information. I haven't tried those connection-pool configuration myself yet. I will do some checks around that.

loicalleyne commented 1 month ago

I've written a short script to make the changes easier to apply after pulling the repository requires yq be installed to edit XML and YML files in-place

POLARIS_DEFAULT_REALM=polaris
PGSERVER="jdbc:postgresql://postgres-host:5432/{realm}"
POLARIS_PG_USER=polarisadmin
POLARIS_PG_PASS=polarispassword

yq -i '.persistence.persistence-unit.properties.property[0].+@value = strenv(PGSERVER) | .persistence.persistence-unit.properties.property[1].+@value = strenv(POLARIS_PG_USER) | .persistence.persistence-unit.properties.property[2].+@value = strenv(POLARIS_PG_PASS) | .persistence.persistence-unit.properties.property += { "+@name":"jakarta.persistence.jdbc.driver","+@value":"org.postgresql.Driver"}' extension/persistence/eclipselink/src/main/resources/META-INF/persistence.xml

yq -i '.metaStoreManager.type = "eclipse-link" | .metaStoreManager += {"persistence-unit": "polaris"} | .defaultRealms[0] = strenv(POLARIS_DEFAULT_REALM)' polaris-server.yml

./gradlew --no-daemon --info -PeclipseLink=true -PeclipseLinkDeps=org.postgresql:postgresql:42.7.4 clean prepareDockerDist