dragonflyoss / Dragonfly

This repository has be archived and moved to the new repository https://github.com/dragonflyoss/Dragonfly2.
https://d7y.io
Apache License 2.0
6k stars 773 forks source link

Persist image cache on supernode for longer time? #1096

Open olehvic opened 5 years ago

olehvic commented 5 years ago

Question

Hello everybody, I'm writing my bachelor thesis and analyzing the docker image distribution with P2P and "classic" registry. I have 70 VMs in the cloud, which are pulling different images several times (50 MB, 300 MB, 2GB each). I see that using Dragonfly is much slower (or do I do something wrong :D), the Supernode immediately deletes the images and has to reload the images on the next pull, would it be possible to avoid this? I would like to set Supernode so that the images are persisted there for a longer time, otherwise there is no sense in viewing Supernode as a CDN. I think Dragonfly is a very interesting project and would like to hear from you.

starnop commented 5 years ago

What's that mean of the Supernode immediately deletes the images? Could you please elaborate more on that? In addition, I think it's better to show your Dragonfly version, your config file and how do you use it.

And I have a guess about what you said, the supernode will start to gc the file if your disk is smaller than 100GB and full GC will be executed once it's below 5GB.

FYI. More GC configurations refer to supernode properties .

olehvic commented 5 years ago

Hey @Starnop , thank you for answering.

I think it's better to show your Dragonfly version

Supernode&clients: v1.0.0

your config file

I have default settings by supernode

how do you use it

Landscape: 1 master host for supernode: 8GB RAM, 4 VCPU, Disk 10GB (ubuntu-bionic-18.04) 10,20,30,70 worker hosts for dfclients: 2GB RAM, 1 VCPU, Disk 10GB (ubuntu-bionic-18.04) each Dragonfly works by all hosts in docker runtime. All VMs are in Cloud.

Testing: I pull&remove the images 10 times by all clients simultaneously using the script to measure the pulltime depending on the number of clients and type of workload (small: ubuntu, mid: clojure , large: latex images).

And I have a guess about what you said, the supernode will start to gc the file if your disk is smaller than 100GB and full GC will be executed once it's below 5GB.

Exactly, this is what i needed to know 👍

I would then adjust youngGCThreshold, fullGCThreshold parameters for my master VM since I only have 10 GB of disk space to ensure supernode caching. One question, are there still improvements in my usecase regarding the configuration of Supernode, which I have to adjust for better results? I've also seen that with larger layes (over 1GB), many clients go blacklisted because the number of errors is high. How should failureCountLimit be set correctly to ensure good cluster performance?

starnop commented 5 years ago

One question, are there still improvements in my usecase regarding the configuration of Supernode, which I have to adjust for better results?

I think so. Would you like to submit an issue to record that?

I've also seen that with larger layers (over 1GB), many clients go blacklisted because the number of errors is high. How should failureCountLimit be set correctly to ensure good cluster performance?

Maybe I will not be able to give you a good answer to this question yet. More testing and validation are needed. Record an issue will be better. 😄