fluid-cloudnative / fluid

Fluid, elastic data abstraction and acceleration for BigData/AI applications in cloud. (Project under CNCF)
https://fluid-cloudnative.github.io/
Apache License 2.0
1.58k stars 949 forks source link

[BUG] fluid PV will introduce top directory to pod, how to have only files under the top directory mounted by Pod without introducing top directory to pod? #4180

Closed moting9 closed 1 week ago

moting9 commented 1 week ago

What is your environment(Kubernetes version, Fluid version, etc.) k8s: 1.30.2 fluid: v1.0.0-31f5433

Describe the bug Define a dataset to mount s3 bucket apiVersion: data.fluid.io/v1alpha1 kind: Dataset metadata: name: codegen spec: mounts:

HF image will check whether model file existed under /data dir image: "ghcr.io/huggingface/text-generation-inference:1.4" imagePullPolicy: IfNotPresent volumeMounts:

What you expect to happen: when huggingface pod uses the PV generated by fluid, it will introduce one-layer dir, /data/data but huggingface pod will check models under /data, and I cannot mount fluid PV to root directory of a pod.

if I use hostpath to provide PV for pod, it won't introduce top directory.

How to reproduce it

Follow the sample, we will see hbase dir is introduced to the pod. https://github.com/fluid-cloudnative/fluid/blob/master/docs/en/samples/accelerate_data_accessing.md

Additional Information

moting9 commented 1 week ago

The issue could be resolved by adding below line to dataset definition. Thanks! path: /

moting9 commented 1 week ago

I add "path: /" to dataset definition, the top directory will not appear after a pod mount the PV, but the dataset cache could not work, "kubectl get dataset" shows the dataset could not be cached

I entered the pod which mounted the fluid PV with "path: /" definition in dataset kubectl exec -it demo-app -- sh "cp -af /data /tmp " will hang if I remove "path: /" in dataset definition ”cp -f /data/fluiddir /tmp“ can work.

TrafalgarZZZ commented 1 week ago

I add "path: /" to dataset definition, the top directory will not appear after a pod mount the PV, but the dataset cache could not work, "kubectl get dataset" shows the dataset could not be cached

I entered the pod which mounted the fluid PV with "path: /" definition in dataset kubectl exec -it demo-app -- sh "cp -af /data /tmp " will hang if I remove "path: /" in dataset definition ”cp -f /data/fluiddir /tmp“ can work.

This is weird. From my understanding, adding path: / has no difference with a default setting (i.e. path: /{mountPoint.name}). A workaround might be using subPath: data in your TGI container. For example:

volumeMounts:
- mountPath: /data
  name: codegen
  subPath: data
moting9 commented 1 week ago

@TrafalgarZZZ Thanks very much! this workaround can resolve the issue.