Closed Xuanwo closed 3 years ago
Maybe we can't finish this work in go-storage alone?
We need to build whole CDC services:
@xxchan is working on this idea.
Here are some of my thoughts.
I think currently go-storage
provides a unified interface to access storage services, and this feature is beginning to support configuring a complex feature for storage services.
Supporting notification configuration is the first step (feature 1). We should consider how users use notifications and how to help them.
Although we may design a feature that can be used without go-storage
, but I think we should start from go-storage
.
I think a user is willing to use go-storage
for:
In the first case, notification is probably not needed(?). In the second case, let's consider how does he use the notification.
An event notification may flow in different paths:
If the user uses Lambda, I guess he may be willing to stick to the vendor and don't need us(?). If he uses queue service, I'm not sure.
If he uses go-storage
and sets notification destination to a customized server (Does it mean this feature has limited use cases?), then he will need to handle the specific notification format (e.g., oss event message, s3 event message), which avoids the purpose of "vendor agnostic".
So we can define a unified storage event message format for users. We can provide a library to convert vendor event message formats into ours (feature 2). (This can be analogous to https://github.com/xo/dburl, with which users can convert a unified connection string format into vendor ones)
As @Xuanwo mentioned that we may support different notification receivers, the event "destination" (customer managed server, subscribing notification as an HTTP endpoint) may further send event messages downstream, and thus we may help provide a unified interface for publishing messages (like a unified (maybe more than) MQ interface) (feature 3). We can even let the server simply forwarding messages as a dedicated halfway station (feature 4, using features 2 & 3).
Now we have 4 possible features:
go-storage
go-storage
?)I think features 1 & 2 are very reasonable. But I doubt the use cases of feature 4. Will users use a server just to forward messages without processing data? If so, it may also involve tricky things to consider, e.g., message delivery guarantee (retries, ordering, and deduplication). Finally, It seems that feature 3 (a general one, not only serving feature 4) is beyond the scope of our organization.
Nice thoughts! Let's resolve questions here.
In the first case, notification is probably not needed(?).
Take data migration and backup as examples, notification is needed to implement the incremental process. For example, with notification support, we can implement incremental migration so that we don't need to list all objects (which is very slow on huge buckets).
If the user uses Lambda, I guess he may be willing to stick to the vendor and don't need us(?).
We are focused on the storage layer itself, so the notification
here is the native notification provided by storage services. That means:
Lambda
, Queue Service
, and so on, they are out of our scope. We only need to handle notifications sent from storage services themselves.Will users use a server just to forward messages without processing data?
Nice question.
Features 3&4 are indeed out of our community scopes. The reason why I include them here is: Between features 1 and 2, we need a service to receive the events. And feature 3&4 is the extension of this service.
The workflow looks like this:
It's OK for me to wipe this service & feature 3&4 out of this proposal, we can discuss them later (maybe when dm plan to implement the incremental data migration).
ping dm's maintainer @Prnyself to take a look.
Between features 1 and 2, we need a service to receive the events.
I think this is just an HTTP server, so it should be decided by users themselves?
I think this is just an HTTP server, so it should be decided by users themselves?
You are right. Let's focus on our job and don't take the service into consideration.
Nice thoughts!
As a service-user, especially for an application based on Golang, being able to get a channel for notification is necessary and fundamental.
What's more, webhook or 3rd party message queue should also be supported in the future.
So it is really similar with the relationship between go-storage
and go-service-xxx
, if we want to support different message services.
But for now, I think we can firstly define the notification sturct, find out what infomation we need to send in notification. Maybe take the badger's db.Subscribe as a reference?
@xxchan Hi, what's the progress?
find out what infomation we need to send in notification
@Prnyself, to make it clear, I think we are not going to support "sending notifications", since this is an internal feature of storage services. We just enable users to turn it on with go-storage
, and we cannot decide "what information to send in notification".
We can decide "what information is commonly needed in received notification" and define a unified format.
@Xuanwo My current plan is:
If this is okay, I will draft an RFC for 1 soon.
@Xuanwo My current plan is:
1. Support notification configuration in go-storage (Set receiver to the cloud notification service or an HTTP endpoint). 2. Define a unified storage event message format (or simply a go struct) along with a library to convert vendor event message formats into it.
If this is okay, I will draft an RFC for 1 soon.
The plan looks good to me!
Here's a (not verified) table of storage event types. We can see that they vary a lot:
InitiateMultipartUpload
& UploadPart
as create event, while s3 and cos don't.So I think this means that storage event is highly service-related and thus it is hard to provide a comprehensive unified event format.
oss | s3 | cos | gcs | qingstor | azblob | ||
---|---|---|---|---|---|---|---|
ObjectCreated | * | √ | √ | √ | √ | √ | |
ObjectCreated:PutObject | √ | √ | √ | ||||
ObjectCreated:PostObject | √ | √ | √ | ||||
ObjectCreated:CopyObject | √ | √ | √ | ||||
ObjectCreated:InitiateMultipartUpload | √ | ||||||
ObjectCreated:UploadPart | √ | ||||||
ObjectCreated:UploadPartCopy | √ | ||||||
ObjectCreated:CompleteMultipartUpload | √ | √ | √ | √ | |||
ObjectCreated:AppendObject | √ | √ | |||||
ObjectDownloaded | ObjectDownloaded:GetObject | √ | |||||
ObjectRemoved | * | √ | √ | √ | √ | √ | |
ObjectRemoved:DeleteObject | √ | ||||||
ObjectRemoved:DeleteObjects | √ | ||||||
version delete | √ | √ | √ | ||||
ObjectReplication | * | ||||||
ObjectReplication:ObjectCreated | √ | ||||||
ObjectReplication:ObjectRemoved | √ | ||||||
ObjectReplication:ObjectModified | √ | ||||||
OperationFailedReplication | √ | ||||||
metadata update | √ | ||||||
abort_multipart | √ |
The APIs of configuring notification are similar (but oss
does not have this API!). Params are: bucket name, event (type, filter, id, arn). The most tricky thing is event type. It seems hard to give a global event type (like global pairs)
Another thing I found out is that some services (like s3) only support sending events to internal services like Amazon SNS, Amazon SQS, or AWS Lambda.
Another thing I found out is that some services (like s3) only support sending events to internal services like Amazon SNS, Amazon SQS, or AWS Lambda.
Actually only qingstor supports HTTP endpoint directly.
And it seems to be encouraged to configure notification in the console instead of using API 🤔
So we now have two difficulties.
For the second problem, my previous idea is to use SNS as a middle station, and add an HTTP endpoint subscription to the SNS topic. If so, the user will have to also provide the SNS arn besides HTTP endpoint.
If so, the user will have to also provide the SNS arn besides HTTP endpoint.
But SNS arn is also very different between services? Can we create SNS for user?
Can we create SNS for user?
some quick results (whether have CreateTopic API):
But SNS arn is also very different between services?
Not sure. Example:
My previous concern was that if the user will go to the console to create a topic, why doesn't he just continue to configure the notification there? So "Can we create SNS for user?" is a problem.
Let's discuss event type later, it's a bit simpler.
My previous concern was that if the user will go to the console to create a topic, why doesn't he just continue to configure the notification there? So "Can we create SNS for user?" is a problem.
So there are two methods:
Maybe related to #634
Is creating a service implicitly acceptable to users? One thing is that it involves billing.
So there are two methods:
* API that accepts the dst endpoint: that means we need to create an SNS service for the user if the service doesn't have native support. * API that accepts service internal ARN (in a plain string): that means the user needs to create SNS service by themself.
For method 1: I agree with your concern, it's not acceptable. For method 2: It looks meaningless for users (why not config them in console directly?)
Maybe it's out of our scope to implement the notification config API (And we don't have the ability for it), let's wipe them out.
Without the notification API support, do you think it still useful to implement a global event struct type?
I think users may write this themselves with few lines of code and won't try to find a simple library to do so.
Let's mark this idea as a backlog, and drop it for now, thanks to your research!
How about implement CDC via scanning? Like rockset does: https://rockset.com/blog/change-data-capture-what-it-is-and-how-to-use-it/
Change data capture (CDC) is a useful tool in many data architectures. Learn what CDC is, how it is implemented and when to use it.
Our Storage Service may support sending notifications to let users get the changes of storage.
This feature likes CDC(Change Data Capture) for DBMS.
We may need to: