NVIDIA / aistore

AIStore: scalable storage for AI applications
https://aistore.nvidia.com
MIT License
1.21k stars 160 forks source link

Got a lot of object lost in minimal production-ready standalone docker deployment #174

Closed Yablon closed 3 months ago

Yablon commented 3 months ago

Is there an existing issue for this?

Describe the bug

  1. I can build the deploy/prod/docker/single/Dockerfile successfully but failed to run the container
  2. I used the official built image named aistorage/cluster-minimal:latest, it works. I pushed 1 million objects to AIS, but when I visit them on serveral machines, one machine(with which I pushe the objects to AIS) succeeds to get all the objects all the time, but other machines fails time to time. The log is : STATUS:404, MESSAGE:t[mlicJsEw]: ais://abc/... does not exist

I want to know what to do ? Is it because of single target single proxy ? I have one machine with 20 SSDs, seems no need for a k8s cluster, that's why I chose the minimal prod ready deployment

Expected Behavior

AIS on my lovely computer with 20ssds performs well, especially when local machines visit it.

Current Behavior

About 8% requests failed.

Steps To Reproduce

Lauch the official image aistorage/cluster-minimal:latest on one machine, and firstly put 1million objects to it, each object size 140KB, and secondly request the on the frequency of 128 objects/second

Possible Solution

No response

Additional Information/Context

No response

AIStore build/version

aistorage/cluster-minimal:latest

Environment details (OS name and version, etc.)

aistorage/cluster-minimal:latest

Yablon commented 3 months ago

Solved by changing to MinIO