Closed DuttaAnik closed 2 months ago
Can you show your configuration for the ingress section in values.yaml? By default, it's configured to run under the /galaxy
prefix, but looking at your ingress, it looks like it's running under /. If so, you could try configuring the ingress values to match as follows:
ingress:
path: /
hosts:
- host: ~
paths:
- path: "/"
- path: "/training-material"
If that's not configured, the galaxy internal nginx server could be picking up the wrong prefix when serving datasets, hence the issue with datasets only.
I do not have any ingress section in the value.yaml file. I have pasted the whole values.yaml file above. Should I copy the ingress code that you just shared in the values.yaml file? or should it be coded somewhere else?
Aah, that probably explains it - I take it you created the ingress shown above by hand? You can get the helm chart to create the ingress for you by setting values appropriately. And yes, you can use the values shown above, but will probably also need to include a tls section and set the host value + annotations to match your environment. Take a look at the values.yaml for sample values.
Hi @nuwang thanks for the helpful tips. Yes, I have created the ingress file by hand. How can I get the helm chart to create the ingress file? Could you please explain? Sorry, I am new in this field. So, maybe I am not understanding the simple things. I got this following section from the original values-org.yaml file from the galaxy github.
ingress:
enabled: true
ingressClassName: nginx
annotations:
nginx.ingress.kubernetes.io/proxy-request-buffering: "off"
nginx.ingress.kubernetes.io/proxy-buffering: "off"
nginx.ingress.kubernetes.io/proxy-http-version: "1.1"
nginx.ingress.kubernetes.io/connection-proxy-header: "Upgrade"
nginx.ingress.kubernetes.io/proxy-body-size: "0"
hosts:
- host: ~
paths:
- path: "/galaxy/api/upload/resumable_upload"
tls: []
I think you've extracted the tusd section. The main section you need to modify is here: https://github.com/galaxyproject/galaxy-helm/blob/0afe341dcae427832ac232c8d87c842436daf971/galaxy/values.yaml#L270
And afterwards. you can update the tusd section to match.
Something like:
ingress:
annotations:
cert-manager.io/cluster-issuer: letsencrypt-production
kubernetes.io/tls-acme: "true"
nginx.ingress.kubernetes.io/ssl-passthrough: "false"
nginx.ingress.kubernetes.io/backend-protocol: HTTP
nginx.ingress.kubernetes.io/proxy-body-size: "0"
path: /
hosts:
- host: galaxy.XX.XX.cloud
paths:
- path: "/"
- path: "/training-material"
tls:
- hosts:
- galaxy.XX.XX.cloud
secretName: galaxy.XX.XX.cloud
Hello @nuwang Thanks for the suggestions. I have modified the values.yaml file like below:
galaxy:
fullnameOverride: galaxy
nameOverride: galaxy
revisionHistoryLimit: 3
images:
galaxy:
repository: quay.io/galaxyproject/galaxy-min
tag: "23.1" # Value must be quoted
pullPolicy: IfNotPresent
refdata:
enabled: false
type: cvmfs
pvc:
size: 10Gi
cvmfs:
deploy: false
storageClassName: "{{ $.Release.Name }}-cvmfs"
persistence:
enabled: true
name: galaxy-pvc
annotations: {}
storageClassName: freenas-nfs-csi
existingClaim: galaxy-k3s-rdloc-galaxy-pvc
accessMode: ReadWriteMany
size: 200Gi
mountPath: /galaxy/server/database
rabbitmq:
enabled: true
deploy: true
persistence:
storageClassName: freenas-iscsi-csi
celery:
concurrency: 1
postgresql:
enabled: true
deploy: true
galaxyDatabaseUser: postgres
galaxyDatabasePassword: password
configs:
galaxy.yml:
galaxy:
admin_users: f@ss.com
# The suggested ingress configuration to be added to the values.yaml:
ingress:
enabled: true
ingressClassName: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-production
kubernetes.io/tls-acme: "true"
nginx.ingress.kubernetes.io/ssl-passthrough: "false"
nginx.ingress.kubernetes.io/backend-protocol: HTTP
nginx.ingress.kubernetes.io/proxy-body-size: "0"
hosts:
- host: galaxy.XX.XX.cloud
paths:
- path: /
pathType: Prefix
- path: /
pathType: Prefix
tls:
- hosts:
- galaxy.XX.XX.cloud
secretName: galaxy.XX.XX.cloud
tusd:
enabled: true
ingress:
enabled: true
annotations:
hosts:
- host: galaxy.XX.XX.cloud
paths:
- path: /
pathType: Prefix
But it did not solve the Download output files issue. I am still getting the same error message that I showed.
Besides, there is another issue. After making these changes, I tried to upload a datafile in Galaxy and there is also an error message and the data file cannot be uploaded.
The error message is:
/bin/bash: line 1: /galaxy/server/database/jobs_directory/000/42/galaxy_42.sh: No such file or directory
This is the description file of the galaxy job from Kubernetes:
Name: gxy-galaxy-k3s-rdloc-ttmkc-4xdkq
Namespace: galaxy
Priority: -1000
Priority Class Name: galaxy-job-priority
Service Account: default
Node: xxx/xxxx
Start Time: Mon, 08 Apr 2024 15:43:48 +0200
Labels: app.galaxyproject.org/destination=k8s
app.galaxyproject.org/handler=job_handler_0
app.galaxyproject.org/job_id=41
app.kubernetes.io/component=tool
app.kubernetes.io/instance=gxy-galaxy-k3s-rdloc
app.kubernetes.io/managed-by=galaxy
app.kubernetes.io/name=x__DATA_FETCH__x
app.kubernetes.io/part-of=galaxy
app.kubernetes.io/version=0.1.0
batch.kubernetes.io/controller-uid=xxxx
batch.kubernetes.io/job-name=gxy-galaxy-k3s-rdloc-ttmkc
controller-uid=44343ed2-1e80-45fc-9b43-d474daea53d2
job-name=gxy-galaxy-k3s-rdloc-ttmkc
Annotations: app.galaxyproject.org/tool_id: __DATA_FETCH__
Status: Failed
IP: XX
IPs:
IP: XX
Controlled By: Job/gxy-galaxy-k3s-rdloc-ttmkc
Containers:
k8s:
Container ID: containerd://XX
Image: quay.io/galaxyproject/galaxy-min:23.1
Image ID: quay.io/galaxyproject/galaxy-min@sha256:XXX
Port: <none>
Host Port: <none>
Command:
/bin/bash
Args:
-c
/galaxy/server/database/jobs_directory/000/41/galaxy_41.sh
State: Terminated
Reason: Error
Exit Code: 127
Started: Mon, 08 Apr 2024 15:43:50 +0200
Finished: Mon, 08 Apr 2024 15:43:50 +0200
Ready: False
Restart Count: 0
Limits:
cpu: 1
memory: 4080218931200m
Requests:
cpu: 1
memory: 4080218931200m
Environment:
GALAXY_SLOTS: 1
GALAXY_MEMORY_MB: 4080
GALAXY_MEMORY_MB_PER_SLOT: 4080
Mounts:
/cvmfs/cloud.galaxyproject.org from galaxy-k3s-rdloc-galaxy-pvc (rw,path="cvmfsclone")
/galaxy/server/database from galaxy-k3s-rdloc-galaxy-pvc (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rfcvn (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
galaxy-k3s-rdloc-galaxy-pvc:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: galaxy-k3s-rdloc-galaxy-pvc
ReadOnly: false
kube-api-access-rfcvn:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 20s
node.kubernetes.io/unreachable:NoExecute op=Exists for 20s
Events: <none>
Can you please suggest how to overcome these issues? I am a bit lost as to where this error is originating from. Thank you very much.
Any update @nuwang?
Hey @DuttaAnik. In your latest values, you seem to have changed the paths under ingress.hosts.[0].paths
, but not under ingress.path
as shown in Nuwan's example. Also, you have two identical paths for /
. There might be other issues which might be hard to diagnose without access to your cluster, but something to try that might help is replacing:
ingress:
enabled: true
ingressClassName: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-production
kubernetes.io/tls-acme: "true"
nginx.ingress.kubernetes.io/ssl-passthrough: "false"
nginx.ingress.kubernetes.io/backend-protocol: HTTP
nginx.ingress.kubernetes.io/proxy-body-size: "0"
hosts:
- host: galaxy.XX.XX.cloud
paths:
- path: /
pathType: Prefix
- path: /
pathType: Prefix
with
ingress:
path: /
enabled: true
ingressClassName: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-production
kubernetes.io/tls-acme: "true"
nginx.ingress.kubernetes.io/ssl-passthrough: "false"
nginx.ingress.kubernetes.io/backend-protocol: HTTP
nginx.ingress.kubernetes.io/proxy-body-size: "0"
hosts:
- host: galaxy.XX.XX.cloud
paths:
- path: /
pathType: Prefix
and let us know if that changes anything.
Hi @almahmoud and @nuwang
Please let me clear up some confusion here. I have two values.yaml files. One is the original values-org.yaml file from galaxy and one is values.rdloc.k3s.yaml
file that I am using for the specific Kubernetes cluster where galaxy is deployed. I have a gitlab repo, where there are two values yaml files. There is a galaxy_ingress.yaml
, pvc_galaxy.yaml
and secret_postgress.yaml
file in the templates folder of the gitlab repo. Then, it is deployed through ArgoCD. So, I have updated the values.rdloc.k3s.yaml
file like this:
galaxy:
fullnameOverride: galaxy
nameOverride: galaxy
revisionHistoryLimit: 3
images:
galaxy:
repository: quay.io/galaxyproject/galaxy-min
tag: "23.1" # Value must be quoted
pullPolicy: IfNotPresent
refdata:
enabled: false
type: cvmfs
pvc:
size: 10Gi
cvmfs:
deploy: false
storageClassName: "{{ $.Release.Name }}-cvmfs"
persistence:
enabled: true
name: galaxy-pvc
annotations: {}
storageClassName: freenas-nfs-csi
existingClaim: galaxy-k3s-rdloc-galaxy-pvc
accessMode: ReadWriteMany
size: 200Gi
mountPath: /galaxy/server/database
rabbitmq:
enabled: true
deploy: true
persistence:
storageClassName: freenas-iscsi-csi
celery:
concurrency: 1
postgresql:
enabled: true
deploy: true
galaxyDatabaseUser: postgres
galaxyDatabasePassword: xxxxx
configs:
galaxy.yml:
galaxy:
admin_users: aa@xxx.com
ingress:
path: /
enabled: true
ingressClassName: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-production
kubernetes.io/tls-acme: "true"
nginx.ingress.kubernetes.io/ssl-passthrough: "false"
nginx.ingress.kubernetes.io/backend-protocol: HTTP
nginx.ingress.kubernetes.io/proxy-body-size: "0"
hosts:
- host: galaxy.XXX.cloud
paths:
- path: /
pathType: Prefix
tls:
- hosts:
- galaxy.XXX.cloud
secretName: galaxy.XXX.cloud
tusd:
enabled: true
ingress:
enabled: true
annotations:
hosts:
- host: galaxy.XXX.cloud
paths:
- path: /
pathType: Prefix
Then I synced it through ArgoCD. But I could not download any existing output file and I get the following error now:
{"err_msg":"Could not get display data for dataset: [Errno 2] No such file or directory: ''","err_code":500001}
As I mentioned, I also have a separate galaxy_ingress.yaml
file that I have pasted in the first message. So, instead of making changes in the values.rdloc.k3s.yaml
file, I copied the ingress
and tusd
sections to the ingress.yaml file. But still there was no change in the error message and I still cannot download any input values.
Could you please provide any solution to this issue?
Also, I cannot upload any data files, which I could do before. Now, I get the error /bin/bash: line 1: /galaxy/server/database/jobs_directory/000/42/galaxy_42.sh: No such file or directory
or the following error:
Traceback (most recent call last):
File "/galaxy/server/lib/galaxy/jobs/runners/__init__.py", line 197, in put
queue_job = job_wrapper.enqueue()
File "/galaxy/server/lib/galaxy/jobs/__init__.py", line 1594, in enqueue
self._set_object_store_ids(job)
File "/galaxy/server/lib/galaxy/jobs/__init__.py", line 1612, in _set_object_store_ids
self._set_object_store_ids_full(job)
File "/galaxy/server/lib/galaxy/jobs/__init__.py", line 1702, in _set_object_store_ids_full
self._setup_working_directory(job=job)
File "/galaxy/server/lib/galaxy/jobs/__init__.py", line 1281, in _setup_working_directory
working_directory = self._create_working_directory(job)
File "/galaxy/server/lib/galaxy/jobs/__init__.py", line 1328, in _create_working_directory
return create_working_directory_for_job(self.object_store, job)
File "/galaxy/server/lib/galaxy/job_execution/setup.py", line 278, in create_working_directory_for_job
object_store.create(job, base_dir="job_work", dir_only=True, obj_dir=True)
File "/galaxy/server/lib/galaxy/objectstore/__init__.py", line 422, in create
return self._invoke("create", obj, **kwargs)
File "/galaxy/server/lib/galaxy/objectstore/__init__.py", line 413, in _invoke
return self.__getattribute__(f"_{delegate}")(obj=obj, **kwargs)
File "/galaxy/server/lib/galaxy/objectstore/__init__.py", line 783, in _create
safe_makedirs(dir)
File "/galaxy/server/lib/galaxy/util/path/__init__.py", line 138, in safe_makedirs
makedirs(path)
File "/usr/local/lib/python3.10/os.py", line 225, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/galaxy/server/database/jobs_directory/000/54'
What chart are you using to install Galaxy? The values you provided do not coincide with the values for the galaxy-helm chart. For example, the root level galaxy:
element and the galaxy-helm chart expects persistence.storageClass
but you have persistence.storageClassName
. So you may want to check the PVC and make sure it is using the storage class it is supposed to be using. You mention having a galaxy_ingress.yaml
file in the templates folder, if you set ingress.enabled = true
in the values file that extra template may be causing conflicts when the Helm chart tries to setup an ingress. You should likely use one or the other.
Also, what version of Kubernetes are you using? I saw similar errors with Galaxy 23.x on later versions of Kubernetes due to an incompatibility with the version of py-kube
used by Galaxy. That was fixed in 24.0 so you may want to try that Galaxy version.
Finally, I don't think the path
in your tusd
section is correct. It should likely be something like:
tusd:
hosts:
- host: galaxy.XXX.cloud
paths:
- path: /api/upload/resumable_upload
pathType: Prefix
If that doesn't solve your problem can you kubectl exec
into the job pod and check the owner and permissions of the files in /galaxy/server/database/jobs_directory
. Those should be owned by the galaxy
user and Galaxy should be running as the galaxy
user.
Hi @ksuderman thank you for the reply and sorry for the delayed response. The Chart.yaml
I am using to install Galaxy looks like this:
apiVersion: v2
name: galaxy
type: application
version: 1.0.0
dependencies:
- name: galaxy
repository: https://github.com/CloudVE/helm-charts/raw/master
version: 5.9.0
The Kubernetes version is: v1.27.7+k3s2
The YAML for the PVC that I created (pvc-galaxy-k3s-rdloc-galaxy-pvc.yaml ) looks like this:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
labels:
app.kubernetes.io/instance: galaxy-k3s-rdloc
app.kubernetes.io/name: galaxy
name: galaxy-k3s-rdloc-galaxy-pvc
namespace: galaxy
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 200Gi
storageClassName: freenas-nfs-csi
With the values.yaml file that I pasted in the first message, everything regarding the file upload was working smoothly, except the Downloading of output files. Then, I increased the PVC size to 400Gi from 100Gi but later on reduced to 200Gi. So, Once I increased the PVC to 400Gi, I could not reduce it. So, I deleted that PVC and created a new PVC with 200Gi. I checked with kubectl
that the new PVC is there under the namespace of galaxy and the old one was removed. Then, I deployed again through ArgoCD and since then the upload does not work anymore.
I have also removed ingress.enabled=true
from the values file and still it is not working.
I executed the job pod through kubectl and listed the directories. It looks like that the database/ directory is owned by the root user of the PVC.
-rwxr-xr-x 1 galaxy galaxy 158 Oct 18 15:23 check_model.sh
-rw-r--r-- 1 galaxy galaxy 871 Oct 18 15:23 CITATION
drwxr-xr-x 7 galaxy galaxy 4096 Oct 18 15:23 client
-rw-r--r-- 1 galaxy galaxy 261 Oct 18 15:23 CODE_OF_CONDUCT.md
drwxr-x--- 1 galaxy galaxy 4096 Apr 15 12:17 config
drwxr-xr-x 2 galaxy galaxy 4096 Oct 18 15:23 contrib
-rw-r--r-- 1 galaxy galaxy 8997 Oct 18 15:23 CONTRIBUTING.md
-rw-r--r-- 1 galaxy galaxy 8341 Oct 18 15:23 CONTRIBUTORS.md
drwxr-xr-x 2 galaxy galaxy 4096 Oct 18 15:23 cron
drwxrwxrwx 13 root root 14 Apr 9 09:19 database
Is this the reason for the upload or download not working then?
I see you are still using 23.1; could you try with 24.0?
galaxy:
image:
tag: "24.0"
You should be able to helm upgrade
an existing installation:
helm upgrade galaxy -n galaxy <your chart> --set galaxy.image.tag="24.0"
Thank you very much for your replies. The problem has been resolved.
Out of curiosity and to help us better assist users in the future, what exactly resolved your problem?
So, I deleted the mountPath: /galaxy/server/database
from the values.yaml file. Then, I created a new PVC and only kept the following storageClass: "freenas-iscsi-csi"
and deleted storageClassName
.
Thanks for the follow up.
Hello,
I have successfully installed Galaxy and deployed it in the Kubernetes cluster. I ran some analysis using Prokka and it completed the analysis and produced some output files. But I cannot download the output files. If I click on the download button, I receive an error message. I have attached the error message. Could anybody please suggest how to solve this issue?
This is the value.yaml file:
This is the ingress.yaml file:
It would be great if you could provide me with any guidance. TIA