Closed unknowntpo closed 2 weeks ago
I failed to install it in a new namespace named as test
.
ubuntu@ip-10-0-4-171:~$ helm upgrade --install gravitino-playground ./gravitino-playground --create-namespace --namespace test
Release "gravitino-playground" does not exist. Installing it now.
Error: Unable to continue with install: Deployment "gravitino-playground-gravitino" in namespace "gravitino-playground" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key "meta.helm.sh/release-namespace" must equal "test": current value is "gravitino-playground"
I didn't see the dependency logic in docker-compose.yaml that is implemented in helm chart yaml. Do you think we need to add this in helm chart ?
After I try to access gravitino ui , gravitino pod is restarting. I think the resources for the pod of gravitino is not enough . After I changed to the following configuration, it don't restart again.
gravitino:
serviceName: &gravitino_host_ip gravitino
image:
repository: datastrato/gravitino
tag: 0.5.1
pullPolicy: IfNotPresent
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 300m
memory: 500Mi
Can you help add some steps in README.md like the following examples? Then these commands are align with above Docker Container CLI , so user can use Trino CLI or spark sql in pod.
kubectl exec trino-5f6b6f996c-cshfv -n gravitino-playground -it -- /bin/bash
kubectl exec spark-74fd98c69-slp8m -n gravitino-playground -it -- /bin/bash
kubectl expose deployment gravitino -n gravitino-playground --name gravitino-ui --type=NodePort --port=8090
minikube service gravitino-ui -n gravitino-playground --url
There's still issue when create namespace like test
. Configmap resources are created in test
namespace, but pods are created in gravitino-playground namespace.
helm upgrade --install gravitino-playground helm-chart --create-namespace --namespace test
kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
gravitino-playground gravitino-7ddcfbbc67-shc5d 0/1 ContainerCreating 0 9s
gravitino-playground hive-8fbd694dd-p7hn5 0/1 ContainerCreating 0 9s
gravitino-playground jupyternotebook-6cddbcbc8b-2stmz 0/1 ContainerCreating 0 9s
gravitino-playground mysql-7db5d7cd68-9fpzj 0/1 ContainerCreating 0 9s
gravitino-playground postgresql-6845fd85d5-lzzr5 0/1 ContainerCreating 0 9s
gravitino-playground spark-74fd98c69-8kt76 0/1 ContainerCreating 0 9s
gravitino-playground trino-5f6b6f996c-twd7m 0/1 ContainerCreating 0 9s
helm list -A
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
gravitino-playground test 1 2024-08-21 11:16:15.242265 +0800 CST deployed gravitino-playground-0.1.0 1.0.0
kubectl get cm -n test
NAME DATA AGE
gravitino-healthcheck-script 1 34s
gravitino-init-script 2 34s
hive-init-script 1 34s
jupyter-init-script 3 34s
kube-root-ca.crt 1 33m
mysql-init-script 1 34s
postgresql-init-script 1 34s
spark-init-script 2 34s
trino-healthcheck-script 1 34s
trino-init-script 2 34s
@unknowntpo Just reminder: some important changes are merged to main branch, https://github.com/apache/gravitino-playground/pull/68.
@unknowntpo Just reminder: some important changes are merged to main branch, https://github.com/apache/gravitino-playground/pull/68.
Thanks for your reminder, I'll handle this.
Can you help add some steps in README.md like the following examples? Then these commands are align with above Docker Container CLI , so user can use Trino CLI or spark sql in pod.
- Log in to the Gravitino playground Trino pod using the following command:
kubectl exec trino-5f6b6f996c-cshfv -n gravitino-playground -it -- /bin/bash
- Log in to the Gravitino playground Spark pod using the following command:
kubectl exec spark-74fd98c69-slp8m -n gravitino-playground -it -- /bin/bash
- In local minikube, access gravitino ui using the following command:
kubectl expose deployment gravitino -n gravitino-playground --name gravitino-ui --type=NodePort --port=8090
minikube service gravitino-ui -n gravitino-playground --url
I think we can not support using minikube
, we should use Docker Desktop or Orbstack instead.
PDF files for Jupyter notebooks are too large to be mounted via ConfigMaps in Kubernetes pods. So I use hostPath PV to mount these files.
but for M1, M2, M3 Mac, the files will disappear due to some bug of minikube.
To verify it, start minikube cluster, and deploy a helm release:
helm upgrade --install gravitino-playground ./helm-chart/ --create-namespace --namespace gravitino-playground --debug --set projectRoot=$(pwd)
You will see this in jupyter notebook's pod log:
/bin/bash: /tmp/gravitino/init.sh: No such file or directory
I didn't see the dependency logic in docker-compose.yaml that is implemented in helm chart yaml. Do you think we need to add this in helm chart ?
I use initContainer
to implement this dependency logic.
There's still issue when create namespace like
test
. Configmap resources are created intest
namespace, but pods are created in gravitino-playground namespace.
Helm seems to use global.namespace
only to fill the template, but not use this value to create new namespace.
So we need to do these step in order to create and use a namespace:
values.yaml
--namespace <namespace name>
option.@danhuawang Would you like to review this PR again ?
Can you help add some steps in README.md like the following examples? Then these commands are align with above Docker Container CLI , so user can use Trino CLI or spark sql in pod.
- Log in to the Gravitino playground Trino pod using the following command:
kubectl exec trino-5f6b6f996c-cshfv -n gravitino-playground -it -- /bin/bash
- Log in to the Gravitino playground Spark pod using the following command:
kubectl exec spark-74fd98c69-slp8m -n gravitino-playground -it -- /bin/bash
- In local minikube, access gravitino ui using the following command:
kubectl expose deployment gravitino -n gravitino-playground --name gravitino-ui --type=NodePort --port=8090
minikube service gravitino-ui -n gravitino-playground --url
I think we can not support using
minikube
, we should use Docker Desktop or Orbstack instead.PDF files for Jupyter notebooks are too large to be mounted via ConfigMaps in Kubernetes pods. So I use hostPath PV to mount these files.
but for M1, M2, M3 Mac, the files will disappear due to some bug of minikube.
To verify it, start minikube cluster, and deploy a helm release:
helm upgrade --install gravitino-playground ./helm-chart/ --create-namespace --namespace gravitino-playground --debug --set projectRoot=$(pwd)
You will see this in jupyter notebook's pod log:
/bin/bash: /tmp/gravitino/init.sh: No such file or directory
After I mount the path of MacOS to minikube virtual vm, jupyter pod running. The following command for reference:
minikube ssh
docker@minikube:~$ mkdir -p /Users/wangdanhua/Workspace/test/gravitino-playground-feat-helm/init/jupyter
nohup minikube mount /Users/wangdanhua/Workspace/test/gravitino-playground-feat-helm/init/jupyter:/Users/wangdanhua/Workspace/test/gravitino-playground-feat-helm/init/jupyter &
minikube ssh
docker@minikube:~$ ls /Users/wangdanhua/Workspace/test/gravitino-playground-feat-helm/init/jupyter
data gravitino-fileset-example.ipynb gravitino-trino-example.ipynb gravitino_llamaIndex_demo.ipynb init.sh
@jerqi Would you like to take a look at this PR? Any more comments?
@unknowntpo Thanks for your work! I install this chart on Docker Desktop , it works. It seems Docker Desktop (enable kubenetes) is more easy to use than minikube and it's more suitable for the playground.
Would you like to help change the following:
@danhuawang Done, please take a look.
LGTM. @xunliu Would you like to take a look?
@unknowntpo I saw docker-compose.yaml and healthy check scripts changed. These changes cause docker compose launch playground failed.
@unknowntpo I am going to add CI check for this repository. But there's an issue when I lint this helm-chart. Can you help check this issue? Thanks!
helm lint helm-chart
walk.go:74: found symbolic link in path: /Users/wangdanhua/Workspace/test/test2/gravitino-playground-feat-helm/helm-chart/healthcheck resolves to /Users/wangdanhua/Workspace/test/test2/gravitino-playground-feat-helm/healthcheck. Contents of linked file included and used
walk.go:74: found symbolic link in path: /Users/wangdanhua/Workspace/test/test2/gravitino-playground-feat-helm/helm-chart/init resolves to /Users/wangdanhua/Workspace/test/test2/gravitino-playground-feat-helm/init. Contents of linked file included and used
walk.go:74: found symbolic link in path: /Users/wangdanhua/Workspace/test/test2/gravitino-playground-feat-helm/helm-chart/healthcheck resolves to /Users/wangdanhua/Workspace/test/test2/gravitino-playground-feat-helm/healthcheck. Contents of linked file included and used
walk.go:74: found symbolic link in path: /Users/wangdanhua/Workspace/test/test2/gravitino-playground-feat-helm/helm-chart/init resolves to /Users/wangdanhua/Workspace/test/test2/gravitino-playground-feat-helm/init. Contents of linked file included and used
==> Linting helm-chart
[INFO] Chart.yaml: icon is recommended
[ERROR] templates/jupyter.yaml: unable to parse YAML: error converting YAML to JSON: yaml: line 98: found character that cannot start any token
Error: 1 chart(s) linted, 1 chart(s) failed
@unknowntpo I am going to add CI check for this repository. But there's an issue when I lint this helm-chart. Can you help check this issue? Thanks!
helm lint helm-chart walk.go:74: found symbolic link in path: /Users/wangdanhua/Workspace/test/test2/gravitino-playground-feat-helm/helm-chart/healthcheck resolves to /Users/wangdanhua/Workspace/test/test2/gravitino-playground-feat-helm/healthcheck. Contents of linked file included and used walk.go:74: found symbolic link in path: /Users/wangdanhua/Workspace/test/test2/gravitino-playground-feat-helm/helm-chart/init resolves to /Users/wangdanhua/Workspace/test/test2/gravitino-playground-feat-helm/init. Contents of linked file included and used walk.go:74: found symbolic link in path: /Users/wangdanhua/Workspace/test/test2/gravitino-playground-feat-helm/helm-chart/healthcheck resolves to /Users/wangdanhua/Workspace/test/test2/gravitino-playground-feat-helm/healthcheck. Contents of linked file included and used walk.go:74: found symbolic link in path: /Users/wangdanhua/Workspace/test/test2/gravitino-playground-feat-helm/helm-chart/init resolves to /Users/wangdanhua/Workspace/test/test2/gravitino-playground-feat-helm/init. Contents of linked file included and used ==> Linting helm-chart [INFO] Chart.yaml: icon is recommended [ERROR] templates/jupyter.yaml: unable to parse YAML: error converting YAML to JSON: yaml: line 98: found character that cannot start any token Error: 1 chart(s) linted, 1 chart(s) failed
This is because we don't provide projectRoot
variable, which is used to locate hostPath.
helm seems to do rendering before actual linting, so if we don't provide it while doing helm lint, the rendered result of hostPath will be:
$ helm template --debug ./helm-chart/
# ... omit other result
- name: artifacts
hostPath:
path: %!s(<nil>)/init/jupyter/
type: DirectoryOrCreate
The rendering of path
is broken.
If we provide projectRoot variable:
helm lint ./helm-chart/ --set projectRoot=$(pwd)
Linter result will be:
$ helm lint ./helm-chart/ --set projectRoot=$(pwd) (base) 06:53:35
walk.go:74: found symbolic link in path: /Users/unknowntpo/repo/unknowntpo/gravitino-playground/feat-helm/helm-chart/healthcheck resolves to /Users/unknowntpo/repo/unknowntpo/gravitino-playground/feat-helm/healthcheck. Contents of linked file included and used
walk.go:74: found symbolic link in path: /Users/unknowntpo/repo/unknowntpo/gravitino-playground/feat-helm/helm-chart/init resolves to /Users/unknowntpo/repo/unknowntpo/gravitino-playground/feat-helm/init. Contents of linked file included and used
walk.go:74: found symbolic link in path: /Users/unknowntpo/repo/unknowntpo/gravitino-playground/feat-helm/helm-chart/healthcheck resolves to /Users/unknowntpo/repo/unknowntpo/gravitino-playground/feat-helm/healthcheck. Contents of linked file included and used
walk.go:74: found symbolic link in path: /Users/unknowntpo/repo/unknowntpo/gravitino-playground/feat-helm/helm-chart/init resolves to /Users/unknowntpo/repo/unknowntpo/gravitino-playground/feat-helm/init. Contents of linked file included and used
==> Linting ./helm-chart/
[INFO] Chart.yaml: icon is recommended
1 chart(s) linted, 0 chart(s) failed
And the rendered result will be:
helm template --debug ./helm-chart/ --set projectRoot=$(pwd)
$ helm template --debug ./helm-chart/
# ... omit other result
- name: artifacts
hostPath:
path: /Users/unknowntpo/repo/unknowntpo/gravitino-playground/feat-helm/init/jupyter/
type: DirectoryOrCreate
@danhuawang
I've run trino, fileset jupyter notebook without any problem.
But in llamaindex jupyter notebook example, at this step of creating fileset cities
, it will throw InternalError
, this happed at docker-compose
and helm-chart
frequently, the error log doesn't provide anything useful, could you help me ?
fileset_cities = None
try:
fileset_cities = demo_catalog.as_fileset_catalog().load_fileset(ident=fileset_ident)
except Exception as e:
fileset_cities = demo_catalog.as_fileset_catalog().create_fileset(ident=fileset_ident,
fileset_type=Fileset.Type.EXTERNAL,
comment="cities",
storage_location="/tmp/gravitino/data/pdfs",
properties={})
print(fileset_cities)
---------------------------------------------------------------------------
NoSuchFilesetException Traceback (most recent call last)
Cell In[10], line 26
25 try:
---> 26 fileset_cities = demo_catalog.as_fileset_catalog().load_fileset(ident=fileset_ident)
27 except Exception as e:
File /opt/conda/lib/python3.11/site-packages/gravitino/catalog/fileset_catalog.py:120, in FilesetCatalog.load_fileset(self, ident)
118 full_namespace = self._get_fileset_full_namespace(ident.namespace())
--> 120 resp = self.rest_client.get(
121 f"{self.format_fileset_request_path(full_namespace)}/{encode_string(ident.name())}",
122 error_handler=FILESET_ERROR_HANDLER,
123 )
124 fileset_resp = FilesetResponse.from_json(resp.body, infer_missing=True)
File /opt/conda/lib/python3.11/site-packages/gravitino/utils/http_client.py:221, in HTTPClient.get(self, endpoint, params, error_handler, **kwargs)
220 def get(self, endpoint, params=None, error_handler=None, **kwargs):
--> 221 return self._request(
222 "get", endpoint, params=params, error_handler=error_handler, **kwargs
223 )
File /opt/conda/lib/python3.11/site-packages/gravitino/utils/http_client.py:212, in HTTPClient._request(self, method, endpoint, params, json, data, headers, timeout, error_handler)
208 raise UnknownError(
209 f"Unknown error handler {type(error_handler).__name__}, error response body: {resp}"
210 ) from None
--> 212 error_handler.handle(resp)
214 # This code generally will not be run because the error handler should define the default behavior,
215 # and raise appropriate
File /opt/conda/lib/python3.11/site-packages/gravitino/exceptions/handlers/fileset_error_handler.py:38, in FilesetErrorHandler.handle(self, error_response)
37 if exception_type == NoSuchFilesetException.__name__:
---> 38 raise NoSuchFilesetException(error_message)
40 super().handle(error_response)
NoSuchFilesetException: Failed to operate fileset(s) [cities] operation [LOAD] under schema [countries], reason [Fileset metalake_demo.catalog_fileset.countries.cities does not exist]
org.apache.gravitino.exceptions.NoSuchFilesetException: Fileset metalake_demo.catalog_fileset.countries.cities does not exist
at org.apache.gravitino.catalog.hadoop.HadoopCatalogOperations.loadFileset(HadoopCatalogOperations.java:176)
at org.apache.gravitino.catalog.hadoop.SecureHadoopCatalogOperations.loadFileset(SecureHadoopCatalogOperations.java:218)
at org.apache.gravitino.catalog.FilesetOperationDispatcher.lambda$loadFileset$2(FilesetOperationDispatcher.java:79)
at org.apache.gravitino.catalog.CatalogManager$CatalogWrapper.lambda$doWithFilesetOps$2(CatalogManager.java:143)
at org.apache.gravitino.utils.IsolatedClassLoader.withClassLoader(IsolatedClassLoader.java:86)
at org.apache.gravitino.catalog.CatalogManager$CatalogWrapper.doWithFilesetOps(CatalogManager.java:137)
at org.apache.gravitino.catalog.FilesetOperationDispatcher.lambda$loadFileset$3(FilesetOperationDispatcher.java:79)
at org.apache.gravitino.catalog.OperationDispatcher.doWithCatalog(OperationDispatcher.java:97)
at org.apache.gravitino.catalog.FilesetOperationDispatcher.loadFileset(FilesetOperationDispatcher.java:77)
at org.apache.gravitino.hook.FilesetHookDispatcher.loadFileset(FilesetHookDispatcher.java:56)
at org.apache.gravitino.catalog.FilesetNormalizeDispatcher.loadFileset(FilesetNormalizeDispatcher.java:58)
at org.apache.gravitino.listener.FilesetEventDispatcher.loadFileset(FilesetEventDispatcher.java:75)
at org.apache.gravitino.server.web.rest.FilesetOperations.lambda$loadFileset$4(FilesetOperations.java:167)
at org.apache.gravitino.lock.TreeLockUtils.doWithTreeLock(TreeLockUtils.java:49)
at org.apache.gravitino.server.web.rest.FilesetOperations.lambda$loadFileset$5(FilesetOperations.java:166)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
at org.apache.gravitino.utils.PrincipalUtils.doAs(PrincipalUtils.java:39)
at org.apache.gravitino.server.web.Utils.doAs(Utils.java:149)
at org.apache.gravitino.server.web.rest.FilesetOperations.loadFileset(FilesetOperations.java:161)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:146)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:189)
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:176)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:93)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:478)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:400)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)
at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:256)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:235)
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:684)
at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)
at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:358)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:311)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
at org.apache.gravitino.server.authentication.AuthenticationFilter.doFilter(AuthenticationFilter.java:86)
at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
at org.apache.gravitino.server.web.VersioningFilter.doFilter(VersioningFilter.java:111)
at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:516)
at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: org.apache.gravitino.exceptions.NoSuchEntityException: No such fileset entity: cities
at org.apache.gravitino.storage.relational.service.FilesetMetaService.getFilesetPOBySchemaIdAndName(FilesetMetaService.java:70)
at org.apache.gravitino.storage.relational.service.FilesetMetaService.getFilesetByIdentifier(FilesetMetaService.java:107)
at org.apache.gravitino.storage.relational.JDBCBackend.get(JDBCBackend.java:195)
at org.apache.gravitino.storage.relational.RelationalEntityStore.get(RelationalEntityStore.java:117)
at org.apache.gravitino.catalog.hadoop.HadoopCatalogOperations.loadFileset(HadoopCatalogOperations.java:164)
... 85 more
During handling of the above exception, another exception occurred:
InternalError Traceback (most recent call last)
Cell In[10], line 28
26 fileset_cities = demo_catalog.as_fileset_catalog().load_fileset(ident=fileset_ident)
27 except Exception as e:
---> 28 fileset_cities = demo_catalog.as_fileset_catalog().create_fileset(ident=fileset_ident,
29 fileset_type=Fileset.Type.EXTERNAL,
30 comment="cities",
31 storage_location="/tmp/gravitino/data/pdfs",
32 properties={})
33 print(fileset_cities)
File /opt/conda/lib/python3.11/site-packages/gravitino/catalog/fileset_catalog.py:170, in FilesetCatalog.create_fileset(self, ident, comment, fileset_type, storage_location, properties)
160 full_namespace = self._get_fileset_full_namespace(ident.namespace())
162 req = FilesetCreateRequest(
163 name=encode_string(ident.name()),
164 comment=comment,
(...)
167 properties=properties,
168 )
--> 170 resp = self.rest_client.post(
171 self.format_fileset_request_path(full_namespace),
172 req,
173 error_handler=FILESET_ERROR_HANDLER,
174 )
175 fileset_resp = FilesetResponse.from_json(resp.body, infer_missing=True)
176 fileset_resp.validate()
File /opt/conda/lib/python3.11/site-packages/gravitino/utils/http_client.py:229, in HTTPClient.post(self, endpoint, json, error_handler, **kwargs)
228 def post(self, endpoint, json=None, error_handler=None, **kwargs):
--> 229 return self._request(
230 "post", endpoint, json=json, error_handler=error_handler, **kwargs
231 )
File /opt/conda/lib/python3.11/site-packages/gravitino/utils/http_client.py:212, in HTTPClient._request(self, method, endpoint, params, json, data, headers, timeout, error_handler)
207 if not isinstance(error_handler, ErrorHandler):
208 raise UnknownError(
209 f"Unknown error handler {type(error_handler).__name__}, error response body: {resp}"
210 ) from None
--> 212 error_handler.handle(resp)
214 # This code generally will not be run because the error handler should define the default behavior,
215 # and raise appropriate
216 raise UnknownError(
217 f"Error handler {type(error_handler).__name__} can't handle this response, error response body: {resp}"
218 ) from None
File /opt/conda/lib/python3.11/site-packages/gravitino/exceptions/handlers/fileset_error_handler.py:40, in FilesetErrorHandler.handle(self, error_response)
37 if exception_type == NoSuchFilesetException.__name__:
38 raise NoSuchFilesetException(error_message)
---> 40 super().handle(error_response)
File /opt/conda/lib/python3.11/site-packages/gravitino/exceptions/handlers/rest_error_handler.py:33, in RestErrorHandler.handle(self, error_response)
30 code = error_response.code()
32 if code in ERROR_CODE_MAPPING:
---> 33 raise ERROR_CODE_MAPPING[code](error_message)
35 raise RESTException(
36 f"Unable to process: {error_message}",
37 )
InternalError: Failed to operate object [cities] operation [CREATE] under [countries], reason [Failed to create fileset metalake_demo.catalog_fileset.countries.cities location file:/tmp/gravitino/data/pdfs]
java.lang.RuntimeException: Failed to create fileset metalake_demo.catalog_fileset.countries.cities location file:/tmp/gravitino/data/pdfs
at org.apache.gravitino.catalog.hadoop.HadoopCatalogOperations.createFileset(HadoopCatalogOperations.java:236)
at org.apache.gravitino.catalog.hadoop.SecureHadoopCatalogOperations.lambda$createFileset$0(SecureHadoopCatalogOperations.java:98)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at org.apache.gravitino.catalog.hadoop.authentication.UserContext.doAs(UserContext.java:162)
at org.apache.gravitino.catalog.hadoop.SecureHadoopCatalogOperations.createFileset(SecureHadoopCatalogOperations.java:95)
at org.apache.gravitino.catalog.FilesetOperationDispatcher.lambda$createFileset$6(FilesetOperationDispatcher.java:136)
at org.apache.gravitino.catalog.CatalogManager$CatalogWrapper.lambda$doWithFilesetOps$2(CatalogManager.java:143)
at org.apache.gravitino.utils.IsolatedClassLoader.withClassLoader(IsolatedClassLoader.java:86)
at org.apache.gravitino.catalog.CatalogManager$CatalogWrapper.doWithFilesetOps(CatalogManager.java:137)
at org.apache.gravitino.catalog.FilesetOperationDispatcher.lambda$createFileset$7(FilesetOperationDispatcher.java:135)
at org.apache.gravitino.catalog.OperationDispatcher.doWithCatalog(OperationDispatcher.java:117)
at org.apache.gravitino.catalog.FilesetOperationDispatcher.createFileset(FilesetOperationDispatcher.java:132)
at org.apache.gravitino.hook.FilesetHookDispatcher.createFileset(FilesetHookDispatcher.java:67)
at org.apache.gravitino.catalog.FilesetNormalizeDispatcher.createFileset(FilesetNormalizeDispatcher.java:76)
at org.apache.gravitino.listener.FilesetEventDispatcher.createFileset(FilesetEventDispatcher.java:96)
at org.apache.gravitino.server.web.rest.FilesetOperations.lambda$createFileset$2(FilesetOperations.java:132)
at org.apache.gravitino.lock.TreeLockUtils.doWithTreeLock(TreeLockUtils.java:49)
at org.apache.gravitino.server.web.rest.FilesetOperations.lambda$createFileset$3(FilesetOperations.java:128)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
at org.apache.gravitino.utils.PrincipalUtils.doAs(PrincipalUtils.java:39)
at org.apache.gravitino.server.web.Utils.doAs(Utils.java:149)
at org.apache.gravitino.server.web.rest.FilesetOperations.createFileset(FilesetOperations.java:120)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:146)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:189)
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:176)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:93)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:478)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:400)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)
at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:256)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:235)
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:684)
at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)
at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:358)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:311)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
at org.apache.gravitino.server.authentication.AuthenticationFilter.doFilter(AuthenticationFilter.java:86)
at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
at org.apache.gravitino.server.web.VersioningFilter.doFilter(VersioningFilter.java:111)
at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:516)
at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
at java.base/java.lang.Thread.run(Thread.java:833)
@jerqi @qqqttt123 Can you help Eric to figure out this problem ?
Could he create a file set using location file:/tmp/gravitino/data/pdfs
?
Could he create a file set using location
file:/tmp/gravitino/data/pdfs
?
@jerqi sorry for the late reply, I set the storage_location to:
storage_location="file:/tmp/gravitino/data/pdfs"
but still got the same result.
Could he create a file set using location
file:/tmp/gravitino/data/pdfs
?@jerqi sorry for the late reply, I set the storage_location to:
storage_location="file:/tmp/gravitino/data/pdfs"
but still got the same result.
@jerqi Can you help take a look again?
Could he create a file set using location
file:/tmp/gravitino/data/pdfs
?@jerqi sorry for the late reply, I set the storage_location to:
storage_location="file:/tmp/gravitino/data/pdfs"
but still got the same result.
Does the directory exist in the Docker image?
Could he create a file set using location
file:/tmp/gravitino/data/pdfs
?@jerqi sorry for the late reply, I set the storage_location to:
storage_location="file:/tmp/gravitino/data/pdfs"
but still got the same result.
Does the directory exist in the Docker image?
Sorry for the late reply, this directory does exists in jupyter notebook container:
Could he create a file set using location
file:/tmp/gravitino/data/pdfs
?@jerqi sorry for the late reply, I set the storage_location to:
storage_location="file:/tmp/gravitino/data/pdfs"
but still got the same result.
Does the directory exist in the Docker image?
Sorry for the late reply, this directory does exists in jupyter notebook container:
But it doesn't exist in Gravitino image. does it? You should guarantee the Gravitino container have the permission and ability to create the path.
Could he create a file set using location
file:/tmp/gravitino/data/pdfs
?@jerqi sorry for the late reply, I set the storage_location to:
storage_location="file:/tmp/gravitino/data/pdfs"
but still got the same result.
Does the directory exist in the Docker image?
Sorry for the late reply, this directory does exists in jupyter notebook container:
But it doesn't exist in Gravitino image. does it? You should guarantee the Gravitino container have the permission and ability to create the path.
It has nothing to do with image.
It's because i forgot to mount ./init/gravitino/data/pdfs
to gravitino pod.
Now, llamaindex example has no problem, I still needs to fix init/jupyter/gravitino-spark-trino-example.ipynb
and resolve conflicts.
@jerqi @danhuawang I found a wired thing and still can't solve it:
in init/jupyter/gravitino-spark-trino-example.ipynb
, we create table with spark.sql.warehouse.dir
config
spark = SparkSession.builder \
.appName("PySpark SQL Example") \
.config("spark.plugins", "org.apache.gravitino.spark.connector.plugin.GravitinoSparkPlugin") \
.config("spark.jars", f"{spark_home}/jars/iceberg-spark-runtime-3.4_2.12-1.5.2.jar,{spark_home}/jars/gravitino-spark-connector-runtime-3.4_2.12-0.6.0-incubating.jar") \
.config("spark.sql.gravitino.uri", f"http://{gravitino_host_ip}:8090") \
.config("spark.sql.gravitino.metalake", "metalake_demo") \
.config("spark.sql.gravitino.enableIcebergSupport", "true") \
.config("spark.sql.catalog.catalog_rest", "org.apache.iceberg.spark.SparkCatalog") \
.config("spark.sql.catalog.catalog_rest.type", "rest") \
.config("spark.sql.catalog.catalog_rest.uri", f"http://{gravitino_host_ip}:9001/iceberg/") \
.config("spark.locality.wait.node", "0") \
.config("spark.sql.warehouse.dir", f"hdfs://{hive_host_ip}:9000/user/hive/warehouse") \
.enableHiveSupport() \
.getOrCreate()
But if we describe this table, the domain name will be changed to pod name: hive-c6544769c-gnzsl
,
which means that if we execute this sql in next step,
spark.sql("INSERT OVERWRITE TABLE employees PARTITION(department='Engineering') VALUES (1, 'John Doe', 30), (2, 'Jane Smith', 28);")
spark.sql("INSERT OVERWRITE TABLE employees PARTITION(department='Marketing') VALUES (3, 'Mike Brown', 32);")
spark.sql("SELECT * from employees").show()
we will get this error:
---------------------------------------------------------------------------
IllegalArgumentException Traceback (most recent call last)
Cell In[4], line 1
----> 1 spark.sql("INSERT OVERWRITE TABLE employees PARTITION(department='Engineering') VALUES (1, 'John Doe', 30), (2, 'Jane Smith', 28);")
2 spark.sql("INSERT OVERWRITE TABLE employees PARTITION(department='Marketing') VALUES (3, 'Mike Brown', 32);")
3 spark.sql("SELECT * from employees").show()
File /usr/local/spark/python/pyspark/sql/session.py:1440, in SparkSession.sql(self, sqlQuery, args, **kwargs)
1438 try:
1439 litArgs = {k: _to_java_column(lit(v)) for k, v in (args or {}).items()}
-> 1440 return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self)
1441 finally:
1442 if len(kwargs) > 0:
File /usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
1316 command = proto.CALL_COMMAND_NAME +\
1317 self.command_header +\
1318 args_command +\
1319 proto.END_COMMAND_PART
1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
1323 answer, self.gateway_client, self.target_id, self.name)
1325 for temp_arg in temp_args:
1326 if hasattr(temp_arg, "_detach"):
File /usr/local/spark/python/pyspark/errors/exceptions/captured.py:175, in capture_sql_exception.<locals>.deco(*a, **kw)
171 converted = convert_exception(e.java_exception)
172 if not isinstance(converted, UnknownException):
173 # Hide where the exception came from that shows a non-Pythonic
174 # JVM exception message.
--> 175 raise converted from None
176 else:
177 raise
IllegalArgumentException: java.net.UnknownHostException: hive-c6544769c-gnzsl
main branch seems to have lots of changes, for example, it use gravitino-dependency.sh
to download jars, I'll rebase and apply that change to helm chart first.
Actually, some people have same issue: link but doesn't have great solution.
I manually set hostname
to hive
as a workaround, and it worked.
@danhuawang now I can successfully run all jupyter notebook examples,
and I also modified ./playground.sh
to support k8s, please take a look.
@unknowntpo Some tiny issue
./playground.sh k8s status
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 10d <none>
expect pods status in gravitino-playground namespace like :
NAME READY STATUS RESTARTS AGE
gravitino-7b5fcfd648-28nvt 1/1 Running 0 13m
hive-79bf985ccd-hktzh 1/1 Running 0 13m
jupyternotebook-7bbbb97574-cblff 1/1 Running 0 13m
mysql-69f9f6f8c9-hrmgt 1/1 Running 0 13m
postgresql-6f6f69b989-prl6w 1/1 Running 0 13m
spark-7fdc9846cb-7dl6v 1/1 Running 0 13m
trino-f8c474b44-wwtw5 1/1 Running 0 13m
./playground.sh k8s stop
INFO: Stopping the playground...
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 10d <none>
@danhuawang Note that there's a prometheus and grafana service comes in during this rebase, could we implement it at another PR ? Because this PR is too big, and meanwhile, lots of features comes in, makes it really hard to be rebased and reimplement it at helm-chart.
@danhuawang Note that there's a prometheus and grafana service comes in during this rebase, could we implement it at another PR ? Because this PR is too big, and meanwhile, lots of features comes in, makes it really hard to be rebased and reimplement it at helm-chart.
Sure, other changes can apply in another PR.
@xunliu This PR is ok. Can you help merge to main branch?
This PR allow us to deploy
gravitino-playground
with helm chart.c.c. @xunliu