GetStream / pusher-chatkit-migration

Migrate Pusher Chatkit to Stream
11 stars 4 forks source link

What is Export/Import? #5

Open dcsena opened 4 years ago

dcsena commented 4 years ago

Can you detail what export/import is? Like full detail of the data that is being exported/imported, how it handles conflicts, and everything else? I'm assuming it's a snapshot/restore operation. Will messages be preserved with the proper timestamp?

It's also unclear how to handle this operation without data loss and more information would be much appreciated.

dcsena commented 4 years ago

Are messages restored idempotently or duplicated?

tschellenbach commented 4 years ago

@thesyncim can provide more details. It's only been a few days since Pusher announced their shutdown so we're still learning here.

@thesyncim any limitations on the import that we should be aware of?

dcsena commented 4 years ago

Thanks Thierry and Marcelo and appreciate you guys moving quickly here to support incoming customers.

I'm still trying to figure out how to migrate our chat service with no downtime or data loss. Our chat product is still operational so a snapshot/restore (which to me is what the export/import sounds like) is not going to be sufficient on its own.

Pusher has webhook support for us to start syncing to stream but it's unclear to me how your import will handle this. Will there be duplicate messages? Will the restore re-create rooms/channels that were deleted in between the snapshot and restore?

tschellenbach commented 4 years ago

I suspect so since many customers did that when layer shutdown.

Marcelo will know for certain though as he created this import flow.

On Thu, Apr 2, 2020, 9:05 PM dcsena notifications@github.com wrote:

Thanks Thierry and Marcelo and appreciate you guys moving quickly here to support incoming customers.

I'm still trying to figure out how to migrate our chat service with no downtime or data loss. Our chat product is still operational so a snapshot/restore (which to me is what the export/import sounds like) is not going to be sufficient on its own.

Pusher has webhook support for us to start syncing to stream but it's unclear to me how your import will handle this. Will there be duplicate messages? Will the restore rooms/channels that were deleted in between the snapshot and restore?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/GetStream/pusher-chatkit-migration/issues/5#issuecomment-608203958, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACAZQL6TORQL2NENTKFJDLRKVG5XANCNFSM4L3GT6PQ .

dcsena commented 4 years ago

FYI from pusher team: The data export we will provide is a point in time snapshot at the the point that export is completed. Currently we are providing these exports on a request by request basis but this functionality should be available as a self-serve option in the dashboard soon. This will mean you will be able to keep easily requesting "later" snapshots as time goes on.

You are correct though that because it is a snapshot there will be a "cut over" point where messages could be lost. It looks like you have identified that webhooks could allow you to track events that happen after getting the export so you can migrate users from platform to the other without losing messages.

Marcelo would love to know details of your import. Thanks.

thesyncim commented 4 years ago

hi @dcsena,

those are the high-level logical steps to migrate to our infrastructure without losing data:

  1. setup stream chat
  2. setup chatKit webhook
  3. provider data export / Stream import
  4. migrate UI

we have a tool that converts the ChatKit export format into Stream format. (more details about our format).

steps 2 and 3 are "tightly coupled" since the format of messages created by the webhook need to match our conversion tool

I will try to describe how the mapping is done:

users

Stream Field Chatkit Field Transformation
id id None
name name None
image profile_image None
** custom_data None

channels

Stream Field Chatkit Field Transformation
id id None
type private public rooms map to livestream and private rooms map to messaging
created_by_id created_by_id None
name name None
members member_user_ids None
** custom_data None

messages

Stream Field Chatkit Field Transformation
id id int → string
channel_type N/A Inherit from parent Channel
channel_id N/A Inherit from parent Channel
user sender_id None
text parts Take the content of the first element of the parts array with a part-type of inline and a mime-type of text/plain. If no such part exists (a small minority of messages) set to a sensible default.
type N/A regular
attachments parts All Chatkit message parts excluding the first part with a type of text/plain
** custom_data None

the trickiest part here is the mapping of messages attachments. I will push some example code this weekend to show how the chatKit message parts map to stream attachments.

since we reuse ChatKit ids, duplicates are impossible, also we do an update if the data(users/messages/channels) already exists

Best, Marcelo Pires

dcsena commented 4 years ago

Awesome this is super helpful. The only thing I'm now worried about is users leaving channel rooms and the import undoing that.

Can you comment on how to handle this?

thesyncim commented 4 years ago

@dcsena, good point, thanks for bringing this up.

Our import process can be a bit smarter, for example, if a chatKit room is updated (using webhook)(eg remove member) after the dump I can rely on the channel.updated_at stored in our DB (compare the currently stored channel.updated_at value with the imported channel.updated_at) to determine if I should overwrite members/channel info(the same logic will apply for messages/users).

(I will update our import process to handle this use case)

Best, Marcelo Pires

dcsena commented 4 years ago

Hey Marcelo,

We're using cognito id's as our userId which takes the form "aws-region:uuid". Looks like you don't allow colons in your userId. Working on mapping : to either '@' or '_'. Is that something your import can handle or should we transform the pusher snapshot first?

dcsena commented 4 years ago

"the trickiest part here is the mapping of messages attachments. I will push some example code this weekend to show how the chatKit message parts map to stream attachments." Currently writing this for our webhook sync and would love this code sample. Thanks Marcelo!

thesyncim commented 4 years ago

@dcsena I'm actually building the whole sync process.

I should have a working version by the end of the day/tomorrow.

Stay tuned :)

thesyncim commented 4 years ago

@dcsena I have implemented the sync flow https://github.com/GetStream/pusher-chatkit-migration/pull/6/files.

please let me know if you find any issue.

(I will clean up the code tomorrow)

dcsena commented 4 years ago

Is it possible you can open source your import script? There's a few modifications we have to make to this webhook code and I'm assuming the import job is done similarly.

thesyncim commented 4 years ago

I'm afraid that's not possible at this point.( @tbarbugli can confirm) what are the changes that you want to do? (feel free to open a PR and if those changes are suitable for other customers I will integrate them in the import process) keep in mind that we need the import process as generic as possible

dcsena commented 4 years ago

Assuming your logic is pretty similar to the webhooks code, it's on determining the channelType.

let type = 'livestream'
if (room.private) {
    type = 'messaging'
}

For us, we have a field in the room's custom_data that will be used to determine the type. Room object would look like this:

{
    "name": "hi",
    "room_id": "123",
    "custom_data": {
        "roomType": "..."
    }
}

And we would want to map the roomType to a channelType. If this can be done as a one-off for us on import, then that works as well. I imagine customers are going to want to define their own channel types or have other other ways to determine the channelType so just based on public/private is going to be very limiting.

thesyncim commented 4 years ago

(we are discussing internally the possibility of open source the converter, its a GO program) we can definitely do small modifications(on demand) like the one that you are describing.

dcsena commented 4 years ago

if you can make modification on demand, then we're okay.