Open steven-zou opened 6 years ago
Very looking forward to the integration of Dragonfly and Harbor.
In order to complete the task of publishing images to the SuperNode
, Dragonfly's internal workflow is:
Image Distribution Driver
Harbor Registry
SuperNode
s of Dragonfly to pre-download the image's layers, and periodically records the task status for querySuperNode
So there are some features need to be completed in Dragonfly:
SuperNode
of DragonflyDFDaemon
(the proxy deployed on every machines) supports pulling private container imagesAnd something need to be confirmed:
Image Distribution Driver
is the connector between container image registry and file distribution system. IMOP, it's better to use pub-sub pattern and to be an independent component. In this way, we can easily reach a very loosely coupling.I drew a new architecture design graph to add some new components of dragonfly need to be implemented.
image heater
and file heater
that providing the ability to warm-up container images and files. It analyzes the layers of images and downloads them from harbor registry to the SuperNode local disk.I think the following diagram is also a good reference for image distribution:
dfdaemon
supports to download private container image08.15 the specification of API between Dragonfly and Harbor
document: https://github.com/alibaba/Dragonfly/blob/master/docs/en/preheat.md
@lowzj
About API /api/preheat
, could we make it compatible with registry API? Harbor is a registry, that means a client can use the standard docker client or registry API to get the image content.
Seems the designed API needs a new Harbor API to do that. If I'm mistaken, please correct me.
@steven-zou
There is no need to create a new Harbor API. Firstly, Dragonfly assembles the minifest url according to the registry API spec and param url
of /api/preheat
, and fetches minifest to get all the urls of image layers. Then Dragonfly downloads all the layers from Harbor.
@lowzj
So, I think the following example should be ok. One question, why is the header an array? Why not use a map?
{
"type": "image",
"url": "https://<harbor_hostname>/v2/library/redis/manifests/latest",
"header": ["Authorization: Bearer <TOKEN>"]
}
@steven-zou
I think the following example should be ok.
I think the url
could be image url: <harbor_host>/<image_name>:<image_tag>
. And the internal steps of Dragonfly could be:
manifest_url
:
https://<harbor_host>/v2/<image_name>/manifests/<image_tag>
manifest_url
fsLayers
from manifest and construct layer_url
of each layer:
https://<harbor_host>/v2/<name>/blobs/<digest>
layer_url
s above to handle any redirection response to get real downloading urlsBut I'm not sure about that whether the header
is enough for authentication of these steps.
why is the header an array? Why not use a map?
There may be multiple message-header fields with the same filed-name in HTTP headers. If use a map, these header fields should be combined into one and each field-value should be separated by comma, like this "field-name: field-value1, field-value2,..."
.
Using a map may be more convenient than using an array in practice. And the multiple fields with same filed-name is not recommend. I will change the type of header
to map.
@lowzj
Ok, got it. But please be aware that the <image_name>
in harbor has a prefix of a project name like library/redis
. You need to take care of that.
Does Dragonfly integrate with Harbour support HTTPS and do the work now? If we want to modify the source code support HTTPS need to do those work and attention? Very grateful
An update on this issue ?
An update on this issue ?
It works with Dragonfly since Dragonfly has already supported the preheat API. @perriea
While we have worked out a demo for integration of Harbor and Dragonfly. But I am not sure if the Harbor side has made a plan to release the work. @steven-zou
Does the preheat API support private Harbor images? (which need docker login)
Does the preheat API support private Harbor images? (which need docker login)
Yes, it can. You can add the login credentials in the headers. @datavisoryushuzhang
when will dragonfly+harbor demo release, can't wait to use it!
STATUS: [INPROGRESS]
Integrate trusted cloud-native registry Harbor with Dragonfly to provide a joint image management and distribution solution to support containerized environments.
Backgrounds:
Harbor: Project Harbor is an open source trusted cloud-native registry project that stores, signs, and scans content. Harbor extends the open source Docker Distribution by adding the functionalities usually required by users such as security, identity, and management. Having a registry closer to the build and run environment can improve the image transfer efficiency. Harbor supports replication of images between registries and also offers advanced security features such as user management, access control, and activity auditing. For more details, please refer to README.
Dragonfly: Dragonfly is an intelligent P2P based file distribution system. It aims to resolve issues related to low-efficiency, low-success rate and a waste of network bandwidth in file transferring process. Especially in large-scale file distribution scenarios such as application distribution, cache distribution, log distribution, image distribution, etc. For more details, please refer to README
Motivations:
With the emergence and development of Kubernetes, it's becoming possible to run and operate large-scale containerized applications and services in enterprise environments. Meanwhile, there are still existing big challenges which cannot be ignored. How to securely and effectively manage the lots of container images produced in the enterprise organizations and distribute them to the large-scale runtimes with less time and efforts when starting applications or services on demand. To address the above challenge, we should build a joint solution from the open source trust cloud-native registry Harbor and the open source intelligent P2P based file distribution system Dragonfly.
These two open sourced projects have very obviously complementary advantages to each other and the joint solution will definitely expand the scenarios of image lifecycle management and improve the securities, reliabilities, and efficiencies.
Idea:
The integration should be a loose couple way, by calling related APIs to complete the required work. The system admin of Harbor registry can configure the related options to enable the API calling from Dragonfly side. The options may include but not limit the following ones:
The integrated configurations can be verified to make sure the connection between the two systems is not broken by testing or ''dry run" etc.
The images are produced by CI/CD pipeline or any other ways and pushed to the Harbor registry. The newly pushed images can be marked with labels automatically or manually. In addition, the admin of registry can also scan the images to make sure it's secure. Of course, the admins can do any other management work if they want.
The admin of registry can select any ready image to promote it to the supervise node of Dragonfly P2P network for the upcoming image pulling requests to improve the distribution performance. The promote action can be triggered by clicking button or auto-triggered by pre-configured rules/policies (If match some conditions, then promote it).
Then if the containerized environments need to pull that image, the Dragonfly will help to distribute it to the nodes by layers via the P2P network.
Basic Workflow:
Architecture:
An architecture design based on the above draft idea:
The components with light blue background are the new things need to be implemented.
Followups: