GoogleCloudPlatform / gcs-fuse-csi-driver

The Google Cloud Storage FUSE Container Storage Interface (CSI) Plugin.
Apache License 2.0
121 stars 31 forks source link

Synchronization Issue between gcsfuse and Kubernetes Pod, When Application running on pods writes to GCS Bucket. #320

Open raviprakash007 opened 3 months ago

raviprakash007 commented 3 months ago

I have a pod which is running a python application with uwsgi service. The uwsgi service writes the logs to pod's /tmp/logs folder.

I have mounted the GCS bucket mapped with /tmp/logs folder so that every log file could go to GCS bucket.

Everything working as expected, but the log files are not visible when I jump to GCS bucket storage pages. However, if i created something manually (like touch a.txt) , it gets visible instantly on web page of GCS buckets.

I entered into POD and I updated the logs by writing something manually and it got reflected on GCS bucket page, with logs till that time. but newer lines in the logs are are again not visible in buckets page. even after refreshing the page.

Can someone assist.?

The configs are as follows: PV

apiVersion: v1
kind: PersistentVolume
metadata:
  name: igcs-data-pv
  namespace: mynamespace
spec:
  storageClassName: cloud-data
  claimRef:
    namespace: mynamespace
    name: gcs-data-claim
  mountOptions:
    - implicit-dirs
    - dir-mode=777
    - file-mode=777
    - only-dir=my_subfolder_for_log_storage   <-- mounted this one with gcs-fuse
  capacity:
    storage: 5Gi 
  accessModes:
    - ReadWriteMany
  csi:
    driver: gcsfuse.csi.storage.gke.io
    volumeHandle: app-application-logs
    volumeAttributes:
      gcsfuseLoggingSeverity: warning

PVC

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
 name: gcs-data-claim
 namespace: mynamespace
spec:
  volumeName: gcs-data-pv
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 3Gi
  storageClassName: cloud-data

Deployment Snippet

---
volumeMounts:
        - mountPath: /api_service_mount
          name: uwsgi-service-claim
        - mountPath: /tmp/logs/   <--- uwsgi Application writes here 
          name: logger
          subPath: logs
--
--
volumes:
      - name: uwsgi-service-claim
        persistentVolumeClaim:
          claimName:  gcs-data-another-bucket-claim
      - name: logger
        persistentVolumeClaim:
          claimName: gcs-logger-claim  <--

Expected Result: the logs files created , should be reflecting automatically , if something created manaully is reflecting to GCS buckets web page.

ankitaluthra1 commented 3 months ago

It appears that the issue is the application not flushing the log file since the file is still open as logs are being added to it. Data in log file will be uploaded to GCS bucket only when flushFile or syncFile is called from kernel/application writing to the file. Flush/Sync file occurs when the file is closed or when applications explicitly trigger sync/flush calls using os.Sync().

It can be verified by analyzing the gcsfuse logs. Follow these two steps to obtain and analyze the gcsfuse logs:

raviprakash007 commented 3 months ago

I am checking logs after enabling debug, then will revert on this thread.

ashmeenkaur commented 3 months ago

@raviprakash007 Just wanted to check in on this issue. I know you mentioned you were looking into the logs after enabling debug. Any updates on that? Are you still running into the same problem?

Let me know if you need any help!