goharbor / harbor

An open source trusted cloud native registry project that stores, signs, and scans content.
https://goharbor.io
Apache License 2.0
23.79k stars 4.73k forks source link

[feature request] Extend OCI Artifact Types in Runtime #12013

Closed gaocegege closed 3 years ago

gaocegege commented 4 years ago

Is your feature request related to a problem? Please describe.

Harbor supports three OCI Artifact types: OCI Image, Helm Chart and CNAB, by default. When users want to use Harbor to store/publish/share new artifacts (e.g. Machine Learning Models), they have to fork Harbor and implement the processor logic in goharbor/harbor.

It works but there are huge operational costs.

Describe the solution you'd like

We expect that the processor is extensible, like Kubernetes scheduler-extender or Harbor scanner. Users could implement their own processor outside Harbor core.

Describe the main design/architecture of your solution

The harbor core can communicate with the remote processor via IP:Port or unix domain socket or something else. We are going to submit a detailed proposal to Harbor community soon.

Describe the development plan you've considered

We, at Caicloud, can help implement it. And we are glad to help the community maintain the feature.

/cc @hainingzhang @steven-zou

/assign @gaocegege @hyy0322 @zhujian7

reasonerjt commented 4 years ago

I personally think it may be too heavy to create a service to solely for extracting data, create a workflow to manage the plugins are more complicated than adding code to run in harbor-core, and I don't see a lot use cases in addition to the learning model. but I'm certainly open for more discussion...

gaocegege commented 4 years ago

@reasonerjt Thanks for your comment.

@reasonerjt I don't see a lot use cases in addition to the learning model. but I'm certainly open for more discussion...

We do not have other OCI artifact types now, but I think the feature will be generally adopted in the future. As you know, there are many other artifacts that can be stored in the OCI-based registry.

Caicloud is a small start-up, but we already have the brisk demand for this feature. We will use the registry to store not only ML/DL models but also datasets, our proprietary application bundle, and so on. Thus we think that it should be popular in the foreseeable future.

And, from the perspective of the artifact authors, we can store user-defined artifact types now but the information about the artifact is not self-contained. We have to fork Harbor-core and implement the processor logic in the fork. If we decide to contribute the logic to the Harbor community, we have to commit into Harbor core and follow the version release process of Harbor, which is not necessary for both Harbor and the artifact authors.

Harbor is claimed to be the first OCI-compliant open-source registry, and we say that:

As artifact types will undoubtedly come and go, it’s crucial that Harbor exists outside of any particular container format, and be flexible enough to onboard and discard any artifact type based on community demand and adherence to common standards.

Thus I think extensibility should be provided to the artifact authors.

@reasonerjt I personally think it may be too heavy to create a service to solely for extracting data, create a workflow to manage the plugins are more complicated than adding code to run in harbor-core

In our expectation, the three types Helm Chart, CNAB, and Image should be kept in Helm core. When there are new non-standard types such as ML/DL models or some proprietary types, we can provide a mechanism to extend Harbor outside Harbor-core.

As for the detailed design and implementation, I think we can have a further discussion on it when the proposal is submitted. I do agree that we do not want the feature is too heavy, and we do not want to affect the current workflow.

steven-zou commented 4 years ago

I think this is a valuable feature request. One more thing we should clarify here is, this proposal does not aim to add more artifact metadata extractors to Harbor, it (the proposal) is trying to provide a capability to let Harbor easily support user-defined artifact kinds with rich metadata format. It will not cause any negative influences to the harbor default supporting artifacts kinds(image, helm v3, CNAB, OPA bundle). It only opens the door to let harbor have certain extent extensibility. The adopter can decide whether they want to leverage this extensibility to support their own artifacts kinds or not.

xaleeks commented 4 years ago

@gaocegege Thanks for the idea and it's great to see harbor being used for hosting common artifacts used in machine learning projects like Kubeflow. Being that there is no dedicated registry geared towards AI/ML on Kubernetes on the market, its awesome to see that good access control and lifecycle management capabilities along with OCI support makes Harbor a good candidate.

I think it's a good idea that we outsource that ability to capture detailed metadata to the different artifact authors. Right now you can push anything to it but none of the metadata comes through. Can you have a proposal ready for discussion by the next community meeting?

gaocegege commented 4 years ago

it's great to see harbor being used for hosting common artifacts used in machine learning projects like Kubeflow.

Yeah, We are glad to contribute our model specification to Kubeflow when it is mature.

Can you have a proposal ready for discussion by the next community meeting?

Yeah, I will submit the proposal this Friday or next Monday with technical details.

xaleeks commented 4 years ago

@gaocegege that's great, looking forward to hearing more on the community meeting next Wed :)

hyy0322 commented 4 years ago

Proposal Preview: proposals/artifact-processor-extender.md

/cc @zhujian7 @gaocegege @hainingzhang /assign @xaleeks @steven-zou

gaocegege commented 4 years ago

Ref https://github.com/goharbor/community/pull/143

gaocegege commented 4 years ago

After the discussions in the community call yesterday, we will add the technical details about the in-tree implementation in our design proposal. It's WIP.

xaleeks commented 4 years ago

Seems we might be able to deliver this in the v2.1 time frame, soft-tagging this 2.1 to keep track. Really appreciate the help here! @gaocegege

gaocegege commented 4 years ago

Thanks for the community. Things we need to do next:

steven-zou commented 3 years ago

This feature has been delivered in the Harbor V2.1 release.

Close this issue.