Open dcsena opened 4 years ago
Are messages restored idempotently or duplicated?
@thesyncim can provide more details. It's only been a few days since Pusher announced their shutdown so we're still learning here.
@thesyncim any limitations on the import that we should be aware of?
Thanks Thierry and Marcelo and appreciate you guys moving quickly here to support incoming customers.
I'm still trying to figure out how to migrate our chat service with no downtime or data loss. Our chat product is still operational so a snapshot/restore (which to me is what the export/import sounds like) is not going to be sufficient on its own.
Pusher has webhook support for us to start syncing to stream but it's unclear to me how your import will handle this. Will there be duplicate messages? Will the restore re-create rooms/channels that were deleted in between the snapshot and restore?
I suspect so since many customers did that when layer shutdown.
Marcelo will know for certain though as he created this import flow.
On Thu, Apr 2, 2020, 9:05 PM dcsena notifications@github.com wrote:
Thanks Thierry and Marcelo and appreciate you guys moving quickly here to support incoming customers.
I'm still trying to figure out how to migrate our chat service with no downtime or data loss. Our chat product is still operational so a snapshot/restore (which to me is what the export/import sounds like) is not going to be sufficient on its own.
Pusher has webhook support for us to start syncing to stream but it's unclear to me how your import will handle this. Will there be duplicate messages? Will the restore rooms/channels that were deleted in between the snapshot and restore?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/GetStream/pusher-chatkit-migration/issues/5#issuecomment-608203958, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACAZQL6TORQL2NENTKFJDLRKVG5XANCNFSM4L3GT6PQ .
FYI from pusher team: The data export we will provide is a point in time snapshot at the the point that export is completed. Currently we are providing these exports on a request by request basis but this functionality should be available as a self-serve option in the dashboard soon. This will mean you will be able to keep easily requesting "later" snapshots as time goes on.
You are correct though that because it is a snapshot there will be a "cut over" point where messages could be lost. It looks like you have identified that webhooks could allow you to track events that happen after getting the export so you can migrate users from platform to the other without losing messages.
Marcelo would love to know details of your import. Thanks.
hi @dcsena,
those are the high-level logical steps to migrate to our infrastructure without losing data:
we have a tool that converts the ChatKit export format into Stream format. (more details about our format).
steps 2 and 3 are "tightly coupled" since the format of messages created by the webhook need to match our conversion tool
I will try to describe how the mapping is done:
Stream Field | Chatkit Field | Transformation |
---|---|---|
id | id | None |
name | name | None |
image | profile_image | None |
** | custom_data | None |
Stream Field | Chatkit Field | Transformation |
---|---|---|
id | id | None |
type | private | public rooms map to livestream and private rooms map to messaging |
created_by_id | created_by_id | None |
name | name | None |
members | member_user_ids | None |
** | custom_data | None |
Stream Field | Chatkit Field | Transformation |
---|---|---|
id | id | int → string |
channel_type | N/A | Inherit from parent Channel |
channel_id | N/A | Inherit from parent Channel |
user | sender_id | None |
text | parts | Take the content of the first element of the parts array with a part-type of inline and a mime-type of text/plain. If no such part exists (a small minority of messages) set to a sensible default. |
type | N/A | regular |
attachments | parts | All Chatkit message parts excluding the first part with a type of text/plain |
** | custom_data | None |
the trickiest part here is the mapping of messages attachments. I will push some example code this weekend to show how the chatKit message parts map to stream attachments.
since we reuse ChatKit ids, duplicates are impossible, also we do an update if the data(users/messages/channels) already exists
Best, Marcelo Pires
Awesome this is super helpful. The only thing I'm now worried about is users leaving channel rooms and the import undoing that.
Can you comment on how to handle this?
@dcsena, good point, thanks for bringing this up.
Our import process can be a bit smarter, for example, if a chatKit room is updated (using webhook)(eg remove member) after the dump I can rely on the channel.updated_at stored in our DB (compare the currently stored channel.updated_at value with the imported channel.updated_at) to determine if I should overwrite members/channel info(the same logic will apply for messages/users).
(I will update our import process to handle this use case)
Best, Marcelo Pires
Hey Marcelo,
We're using cognito id's as our userId which takes the form "aws-region:uuid". Looks like you don't allow colons in your userId. Working on mapping : to either '@' or '_'. Is that something your import can handle or should we transform the pusher snapshot first?
"the trickiest part here is the mapping of messages attachments. I will push some example code this weekend to show how the chatKit message parts map to stream attachments." Currently writing this for our webhook sync and would love this code sample. Thanks Marcelo!
@dcsena I'm actually building the whole sync process.
I should have a working version by the end of the day/tomorrow.
Stay tuned :)
@dcsena I have implemented the sync flow https://github.com/GetStream/pusher-chatkit-migration/pull/6/files.
please let me know if you find any issue.
(I will clean up the code tomorrow)
Is it possible you can open source your import script? There's a few modifications we have to make to this webhook code and I'm assuming the import job is done similarly.
I'm afraid that's not possible at this point.( @tbarbugli can confirm) what are the changes that you want to do? (feel free to open a PR and if those changes are suitable for other customers I will integrate them in the import process) keep in mind that we need the import process as generic as possible
Assuming your logic is pretty similar to the webhooks code, it's on determining the channelType.
let type = 'livestream'
if (room.private) {
type = 'messaging'
}
For us, we have a field in the room's custom_data that will be used to determine the type. Room object would look like this:
{
"name": "hi",
"room_id": "123",
"custom_data": {
"roomType": "..."
}
}
And we would want to map the roomType to a channelType. If this can be done as a one-off for us on import, then that works as well. I imagine customers are going to want to define their own channel types or have other other ways to determine the channelType so just based on public/private is going to be very limiting.
(we are discussing internally the possibility of open source the converter, its a GO program) we can definitely do small modifications(on demand) like the one that you are describing.
if you can make modification on demand, then we're okay.
Can you detail what export/import is? Like full detail of the data that is being exported/imported, how it handles conflicts, and everything else? I'm assuming it's a snapshot/restore operation. Will messages be preserved with the proper timestamp?
It's also unclear how to handle this operation without data loss and more information would be much appreciated.