Blobbers can become unresponsive when there are large number of files/directories.

lpoli commented 3 years ago

I came across list-all subcommand in zboxcli which uses getRemoteFilesAndDirs function: https://github.com/0chain/gosdk/blob/master/zboxcore/sdk/sync.go#L44

What it does is, it requests blobbers recursively to get list of files/directories and further traverses inside each child directories and so on till end. So if there are say 100 subdirectories then it will make atleast 100 such requests.

There is another option to request for ObjectTree from blobbers i.e. to call http request to blobbers as given in the doc: https://api.0chain.net/#402a1367-2f35-430b-9eaa-42917ead886b So for instance if I send request for retrieving ObjectTree for root path then it will return the json response of whole file hierarchy in that respective allocation.

Above call is fine if there are smaller number of files but we need to consider that an allocation can contain thousands of such files. For about 5 directories and 5 files the response size was about 60KB so for large number of files it will be large sized response making blobber busy to serve request for certain amount of time as metadata can be of for example; 20MB which obviously stalls the blobber.

And above is just for single allocation scenario. Blobbers however are not confined to single allocation and there can be multitude of clients requests.

So the solution can be to provide paginated response or partial tree response(say we only provide few levels of tree depth in response). There are two other options i.e. ObjectPath and ReferencePath requests. However both can grow larger in response size and have same issue as ObjectTree requests.

Kishan-Dhakan commented 3 years ago

list-all and list, both make calls to the function NewListRequest in gosdk/zboxcore/zboxutil/http.go. In this, the an http GET request is made to the endpoint /v1/file/list/ whose response is the list (docs here).

Therefore, one way to go is, the list and list-all commands can provide offset and limit params (ex: show 100 items starting from the 51st item). Then, we extract and show the responses requested by client. This is not efficient as it would still mean fetching the entire list, but the current API doesn't have pagination implemented (as per the docs).

The other way, is to update the 0chain API to have a pagination as well, i.e., accept param(s) at the endpoint /v1/file/list/ which provide context for a paginated response.

Cc: @iamrz1

guruhubb commented 3 years ago

Just need to make the change at the blobber end

lpoli commented 3 years ago

Hello Andrei, I am working on making 0fs where user can mount their allocation to some directory and access files using system commands, same as local files.

It will be good user experience if they can "cd" into some directory and "ls" list all files in some directory quickly. Calling blobbers for each such requests will be slow and making frequent requests for each operation is also costly for blobber.

So to minimize this issue I need to have paginated view of ObjectTree. Currently request made to get ObjectTree for some path will return all the file tree from that respective path. What would be good was to have paginated response in both direction (paginated breadth and depth) For example; User can have 1000 files in same directory so returning all the file info in response is infeasible. Similar issue is going till the end depth of the tree.

So I think we should paginate both side; breadth and depth.

0fs would be constructing full tree and it will update only when allocation root changes(which is hash of combined path and file changes). That way 0fs can also save already read file into disk temporarily providing easy access.

Kishan-Dhakan commented 3 years ago

Just need to make the change at the blobber end

Right. I only mentioned the other way because this issue was opened in gosdk instead of blobber.

moldis commented 2 years ago

Need to add this ticket to clients @lpoli

lpoli commented 2 years ago

What do you mean by clients?

lpoli commented 2 years ago

With 64GB, blobber can handle even such large files. There is GetRefs endpoint in blobber, which should be used and other method to get metadata should be replaced. With GetRefs, consensus is also calculated among common fields.

0chain / gosdk

Blobbers can become unresponsive when there are large number of files/directories. #117