Satellite-im / Warp

Interface Driven Distributed Data Service
MIT License
16 stars 4 forks source link

raygun: Convert `Conversation` into a trait #457

Closed dariusc93 closed 8 months ago

dariusc93 commented 8 months ago

Currently, Conversation is a struct that contains data about the conversation. This may pertain about the name, list of members, etc. This was originally done since having Conversation as a trait previously raised some concern about the object safety as well as how to structure things safely across FFI, wasm (when wasm had some focused on at the time), however after some thought based on #453, I do believe we could try to implement Conversation as a trait instead as a struct.

Design concept:

pub trait Conversation {
        /// ID of the converation in Uuid format
        fn id(&self) -> Uuid;

        /// Name of the conversation, if any
        fn name(&self) -> Option<String>;

        /// Time of when the conversation was created
        fn created(&self) -> DateTime<Utc>;

        /// Last time the conversation was modified or updated
        fn modified(&self) -> DateTime<Utc>;

        /// The conversation type
        /// Currently: Direct, Group
        fn conversation_type(&self) -> ConversationType;

        /// Conversation settings
        fn settings(&self) -> ConversationSettings;

        /// List of members to the conversation
        fn members(&self) -> Vec<DID>;
}

(Full concept)

pub trait ConversationUpdate {
        /// Update conversation name
        /// Note: This will only update the group conversation name
        async fn update_name(&mut self, name: &str) -> Result<(), Error>;

        /// Add a member to the conversation
        async fn add_member(&mut self, did_key: &DID) -> Result<(), Error>;

        /// Remove a member from the conversation
        async fn remove_member(&mut self, did_key: &DID) -> Result<(), Error>;

        /// Update conversation settings
        async fn update_settings(&mut self, settings: ConversationSettings) -> Result<(), Error>;
}

pub trait ConversationMessage {
        /// Retrieve a message from the conversation
        async fn get_message(&self, message_id: Uuid) -> Result<Message, Error>;

        /// Retrieve a message reference from the conversation
        async fn get_message_reference(&self, message_id: Uuid) -> Result<MessageReference, Error>;

        /// Retrieve messages from the conversation
        async fn get_messages(&self, options: MessageOptions) -> Result<Messages, Error>;

        /// Retrieve message references from the conversation
        async fn get_message_references(&self, options: MessageOptions) -> Result<BoxStream<'static, MessageReference>, Error>;

        /// Get a number of messages in the conversation
        async fn get_message_count(&self, message_id: Uuid) -> Result<usize, Error>;

        /// Get a status of a message in the conversation
        async fn message_status(&self, message_id: Uuid) -> Result<MessageStatus, Error>;

        /// Sends a message to a conversation.
        async fn send(&mut self, message: Vec<String>) -> Result<(), Error>;

        /// Edit an existing message in a conversation.
        async fn edit(
            &mut self,
            message_id: Uuid,
            message: Vec<String>,
        ) -> Result<(), Error>;

        /// Delete message from a conversation
        async fn delete(
            &mut self,
            message_id: Option<Uuid>,
        ) -> Result<(), Error>;

        /// React to a message
        async fn react(
            &mut self,
            message_id: Uuid,
            state: ReactionState,
            emoji: String,
        ) -> Result<(), Error>;

        /// Pin a message within a conversation
        async fn pin(
            &mut self,
            message_id: Uuid,
            state: PinState,
        ) -> Result<(), Error>;

        /// Reply to a message within a conversation
        async fn reply(
            &mut self,
            message_id: Uuid,
            message: Vec<String>,
        ) -> Result<(), Error>;

        /// Send an event to a conversation
        async fn send_event(&mut self, event: MessageEvent) -> Result<(), Error>;

        /// Cancel event that was sent, if any.
        async fn cancel_event(&mut self, event: MessageEvent) -> Result<(), Error>;
}

pub trait ConversationAttachment {
    /// Send files to a conversation.
    /// If no files is provided in the array, it will throw an error
    async fn attach(
        &mut self,
        message_id: Option<Uuid>,
        path: Vec<Location>,
        messages: Vec<String>,
    ) -> Result<AttachmentEventStream, Error>;

    /// Downloads a file that been attached to a message
    /// Note: Must use the filename associated when downloading
    async fn download(
        &self,
        message_id: Uuid,
        file: String,
        dest: PathBuf,
    ) -> Result<ConstellationProgressStream, Error>;

    /// Stream a file that been attached to a message
    /// Note: Must use the filename associated when downloading
    async fn download_stream(
        &self,
        message_id: Uuid,
        file: &str,
    ) -> Result<BoxStream<'static, Result<Vec<u8>, Error>>, Error>;
}

pub trait ConversationStream {
    /// Subscribe to an stream of events from the conversation
    async fn get_conversation_stream(&mut self) -> Result<MessageEventStream, Error>;
}

Having the trait designed this way allows us to migrate messaging logic to be specifically tied to the Conversation trait itself, removing the need to supply a Uuid thats representing the id to the conversation to the function itself.

Internals

Internally, we can have a struct that contains a clone of the task with metadata pretaining to the set of methods in Conversation, which would allow us to avoid making async calls. We will map each function to a specific internal function while also utilizing the Uuid specific to that conversation.

When the conversation is deleted (either directly or indirectly), we will still allow the metadata stored for existing objects to be supplied to those methods, however they will no longer be updated. For the remaining methods, they will yield Error::InvalidConversation error (or some variant) showing that the conversation no longer exist.

We can deprecate the existing functions while defaulting raygun methods to the respected Conversation methods.

Pros

Cons

Note:

zeenix commented 8 months ago

This was originally done since having Conversation as a trait previously raised some concern about the object safety

As you probably already found out during implementation that the only thing that would make these traits non-object-safe, is the async methods but since we're using async-trait crate, we'll be fine in that regard.

which would allow us to avoid making async calls

Why do we want to avoid making async calls? I know you've this lock-free channels-based approach, which has its benefits but if we put all work behind a single task, it creates a bottleneck. E.g a call for changing the name of a conversation A, shouldn't have to wait for another call to get all the messages of conversation B (or even conversation A). So we should be careful here, not to introduce bottlenecks where they are not really needed.

  • We could return Result<T, warp::error::Error> to the data getter methods, however this would break compatibility with the existing Conversation getter api to such methods.

Is this really a problem since we don't have (public) releases of Warp and only use it internally? Btw the name is already taken on crates.io.

zeenix commented 8 months ago

BTW, I don't think we need to use trait objects. We can use generics with trait bounds and associated types instead, unless I'm missing something here.

zeenix commented 8 months ago

BTW, I don't think we need to use trait objects. We can use generics with trait bounds and associated types instead, unless I'm missing something here.

Actually yes but only because the approach taken in the PRs is expecting one impl per crate. While I've no issue with that, it made me realize that neither of the PRs fix the main design flow that #453 address: conversation type being a disconnected field from the other conversation-type-specific fields instead of being represented by the type itself and hence completely eliminating the possibility of any mistakes in this regard.

If I understood correctly, your main objection to #453 was keeping Conversation inside the ConversationDocument. While I still don't quite understand your objection there, I can live without that change and modify my PR to only turn Conversation to enum. Would that be acceptable?

dariusc93 commented 8 months ago

Why do we want to avoid making async calls?

Conversation itself is mostly a object where one get information provided at the time Conversation is provided. If we were to just think about what we do "internally", parts of uplink dont use async directly which means that the async function would need to be called else where, construct a object with that data (including ones from the async function) and pass it forward, while in the current implementation, they can just take Conversation and pass it on and use that accordingly.

I know you've this lock-free channels-based approach, which has its benefits but if we put all work behind a single task, it creates a bottleneck. E.g a call for changing the name of a conversation A, shouldn't have to wait for another call to get all the messages of conversation B (or even conversation A). So we should be careful here, not to introduce bottlenecks where they are not really needed.

The "bottleneck" itself acceptable since it makes sure things the state of the conversations (and its message) are consistent when it is accessed (although getting all the messages will be moved back down to MessageStore and not be done in the task itself - not sure how that got overlooked there)

Is this really a problem since we don't have (public) releases of Warp and only use it internally? No

Btw the name is already taken on crates.io.

We are aware that the name is taken on crates.io and to my knowledge dont have plan to publish anything there yet so this is not of any concern right now (though when we do plan to, we will adjust things accordingly).

Actually yes but only because the approach taken in the PRs is expecting one impl per crate. While I've no issue with that, it made me realize that neither of the PRs fix the main design flow that https://github.com/Satellite-im/Warp/pull/453 address: conversation type being a disconnected field from the other conversation-type-specific fields instead of being represented by the type itself and hence completely eliminating the possibility of any mistakes in this regard.

The PR (mainly https://github.com/Satellite-im/Warp/pull/459) is the middle ground to remove the duplicated fields since Conversation was used internally, which was the main discussion of https://github.com/Satellite-im/Warp/pull/453 (while https://github.com/Satellite-im/Warp/pull/458 has duplication of the fields are still acceptable since one is more raw in nature while the other is resolved data and would eventually store additional items).

While I still don't quite understand your objection there, I can live without that change and modify my PR to only turn Conversation to enum. Would that be acceptable?

That would be acceptable since I wasnt against converting it to an enum.

zeenix commented 8 months ago

I know you've this lock-free channels-based approach, which has its benefits but if we put all work behind a single task, it creates a bottleneck. E.g a call for changing the name of a conversation A, shouldn't have to wait for another call to get all the messages of conversation B (or even conversation A). So we should be careful here, not to introduce bottlenecks where they are not really needed.

The "bottleneck" itself acceptable since it makes sure things the state of the conversations (and its message) are consistent when it is accessed (although getting all the messages will be moved back down to MessageStore and not be done in the task itself - not sure how that got overlooked there)

As I wrote, I understand the rationale for doing it this way but I don't think we can just say that bottlenecks are acceptable. There is no reason unrelated operations (especially read-only) that doesn't need any synchronisation between themselves, should have to wait for each other in a queue.

I think it would be worth considering to split into further tasks, e.g it could be 1 task per conversation (just an example).

Btw the name is already taken on crates.io.

We are aware that the name is taken on crates.io and to my knowledge dont have plan to publish anything there yet so this is not of any concern right now (though when we do plan to, we will adjust things accordingly).

Yeah, it was just a side-note. Nothing to worry about. :)

While I still don't quite understand your objection there, I can live without that change and modify my PR to only turn Conversation to enum. Would that be acceptable?

That would be acceptable since I wasnt against converting it to an enum.

Cool. I'm glad we could find a middleground. :) Would we still need traits then? They'll have to be named differently, if so.

dariusc93 commented 8 months ago

As I wrote, I understand the rationale for doing it this way but I don't think we can just say that bottlenecks are acceptable.

I had "bottlenecks" in quotes because some would see having the channel buffer set extremely low be considered a bottle neck when that isnt the case. As for the actual list of messages itself, After reviewing it, one part returns a stream, which would not cause any issues as that stream would need to be polled, while another the other part (which generates message pages or just list the messages) would, but for those parts, we could just easily resolve that by passing a future back to MessageStore and awaiting there (similar to what I did for files), which will be taken care of soon.

I think it would be worth considering to split into further tasks, e.g it could be 1 task per conversation (just an example).

Technically speaking, each conversation is in a way in its own task, with the pubsub stream polled in each task and pushing to a single receiver that is polled (before that change, each conversation was processed in its own task but found it to be redundant in some cases). In a more expanded way, where each conversation gets its own set of functions, etc.,, it would still have to wait so the document is updated on the list and stored in the blockstore.

Cool. I'm glad we could find a middleground. :) Would we still need traits then? They'll have to be named differently, if so.

No. It wouldnt make sense to have both since the purpose of Conversation being as a trait would be to replace the struct on the frontend (as well as remove the fields there and use whatever may be the case in the backend side).

zeenix commented 8 months ago

I had "bottlenecks" in quotes because some would see having the channel buffer set extremely low be considered a bottle neck when that isnt the case.

The size of buffer isn't the issue. If it's an async call, the caller has to wait for it's request to be completed, regardless of the queue size. The task would be the bottleneck since it can only process 1 request at a time.

Technically speaking, each conversation is in a way in its own task, with the pubsub stream polled in each task

That doesn't help with the bottleneck I'm talking about. As long as the processing task is handling one request at a time, there is a bottleneck.

it would still have to wait so the document is updated on the list and stored in the blockstore.

That makes sense for all the write operations and also simultaneous read and write operations. I think an async RwLock would be the best option here as they'd allow all read operations to work simultaneously and if we do it right and limit the scope of the locking as much as possible, most of the processing of the write operation can also go simultaneously and only keeping a lock when needed and immediately freeing it.

I know that goes about that current design but I think it's worth considering for certain cases, such as for conversation API.

Cool. I'm glad we could find a middleground. :) Would we still need traits then? They'll have to be named differently, if so.

No. It wouldnt make sense to have both since the purpose of Conversation being as a trait would be to replace the struct on the frontend (as well as remove the fields there and use whatever may be the case in the backend side).

Exactly my thoughts but just wanted to make sure we're on the same page.

BTW, I'll need to also switch ConversationDocument into an enum if not embedding Conversation into it. Otherwise, we'll have the same disconnect there that I'll resolve in Conversation. I hope that's OK with you.

dariusc93 commented 8 months ago

The size of buffer isn't the issue. If it's an async call, the caller has to wait for it's request to be completed, regardless of the queue size. The task would be the bottleneck since it can only process 1 request at a time.

The design of it processing one request at a was done intentionally to make sure things remain consistent and when there are no external request sent via RayGun, the task, etc., can continue to be polled until there is something that comes in. Even when something does come in, it doesnt take long to process and return the result.

That makes sense for all the write operations and also simultaneous read and write operations. I think an async RwLock would be the best option here as they'd allow all read operations to work simultaneously and if we do it right and limit the scope of the locking as much as possible, most of the processing of the write operation can also go simultaneously and only keeping a lock when needed and immediately freeing it.

I know that goes about that current design but I think it's worth considering for certain cases, such as for conversation API.

Im not so sure on using RwLock long term in this regard would offer any real benefits or gains given that a lot of the operations would still be write, with some read operations only return either streams or internally could return a future from the task to prevent stalling that task for a read operation (see previous comment). Previously, we did use RwLock, but then it made sense to have a task handle it for every conversation to where it makes sense everything to be in a single task (minus what was mentioned previously).

BTW, I'll need to also switch ConversationDocument into an enum if not embedding Conversation into it. Otherwise, we'll have the same disconnect there that I'll resolve in Conversation. I hope that's OK with you.

Hmm I think would prefer that it remains as a struct, moving the ConversationType internally, however I may not be against it being as an enum too, but may depend on how comes out in the end, but this would be a separate discussion.

Nonetheless, If the enum will be used, this issue and the corresponding PRs can be closed but if it arise in the future that this may be better to a trait for Conversation, this PR could be reopened.

zeenix commented 8 months ago

The design of it processing one request at a was done intentionally to make sure things remain consistent

Yeah, as I wrote, I do understand the reasoning for the design and the advantages. Just that we have to acknowledge the drawbacks too and hence consider alternatives.

Im not so sure on using RwLock long term in this regard would offer any real benefits or gains given that a lot of the operations would still be write

Are we talking about the number of possible operations or the actual number of operations that will happen simultaneously? The former isn't relevant but the latter is. I can think of some read only operations happening at the same time (e.g at launch when loading conversations and/or metadata about them) but if we've done some analysis that shows the reality to be on the contrary, then I would take back my suggestion.

But still to keep in mind that I'm only suggesting that we consider the approach and keep the bottleneck in mind.

BTW, I'll need to also switch ConversationDocument into an enum if not embedding Conversation into it. Otherwise, we'll have the same disconnect there that I'll resolve in Conversation. I hope that's OK with you.

Hmm I think would prefer that it remains as a struct, moving the ConversationType internally,

Well then we won't have solved the problem completely.

however I may not be against it being as an enum too

Great, cause I don't see any way around it if we want to fully solve the main issue.

but may depend on how comes out in the end, but this would be a separate discussion.

Sure thing.

Nonetheless, If the enum will be used, this issue and the corresponding PRs can be closed but if it arise in the future that this may be better to a trait for Conversation, this PR could be reopened.

👍

dariusc93 commented 8 months ago

Well then we won't have solved the problem completely.

however I may not be against it being as an enum too

Great, cause I don't see any way around it if we want to fully solve the main issue.

The conversion to an enum is mostly the concern of whats in warp, not in warp-ipfs as there isnt really a problem that has an impact short or long term besides the issue between ConversationType and ConversationSettings.

zeenix commented 8 months ago

The conversion to an enum is mostly the concern of whats in warp, not in warp-ipfs as there isnt really a problem that has an impact short or long term besides the issue between ConversationType and ConversationSettings.

The

issue between ConversationType and ConversationSettings

is the main issue being resolved here and it exists in both places so converting Conversation to an enum and not embedding it in ConverstionDocument, implies that we change both types if we're going to resolve the problem everywhere effectively.

I actually also prefer not to change ConversationDocument to an enum and embedding Conversation in it would have allowed us to do that (apart from deduplication). i-e you've to make a choice between these 2 options, unless you've a better idea.

zeenix commented 8 months ago

but may depend on how comes out in the end, but this would be a separate discussion.

@dariusc93 So one issue with ConversationDocument becoming an enum is that we'll have to have more duplication since it'll also need a Common struct (like ConversationCommon struct you saw in #453). I'm sure you don't like that and I don't really like it either (it's also a lot more work than needed, well that's what we get for duplication).

However, as I previous explained, it's either this or we embedding Conversation in ConversationDocument. Let me know which one you prefer.

dariusc93 commented 8 months ago

@dariusc93 So one issue with ConversationDocument becoming an enum is that we'll have to have more duplication since it'll also need a Common struct (like ConversationCommon struct you saw in https://github.com/Satellite-im/Warp/pull/453). I'm sure you don't like that and I don't really like it either (it's also a lot more work than needed, well that's what we get for duplication).

However, as I previous explained, it's either this or we embedding Conversation in ConversationDocument. Let me know which one you prefer.

Well for now we could leave it as is internally, move ConversationType internally and probably also ditch the direct conversation settings internally as per my previous comment, which would solve the issue in itself.

zeenix commented 8 months ago

Well for now we could leave it as is internally, move ConversationType internally

I'm not sure what this means exactly but moving things around doesn't solve the issue at hand either. We've a mistake waiting to happen because of the disconnection between conversation-type-specific fields. enums completely eliminate the issue because it's a natural representation here (i-e a conversation is either a direct conversation or group).

probably also ditch the direct conversation settings internally

That's quite unrelated IMO. If you remove that struct, and use Option<GroupSettings> (for example), we still have the disconnect.


Another solution would be to only split conversation-type-specific data into an enum. If you recall, this is exactly what I tried to do when I first added group settings: I added the settings to ConversationType enum. You didn't like that at the time but I wasn't sure about it either but I think for me it was because of the name. We can think of a better name of course. Would that be something more acceptable to you?

zeenix commented 8 months ago

Another solution would be to only split conversation-type-specific data into an enum. If you recall, this is exactly what I tried to do when I first added group settings: I added the settings to ConversationType enum. You didn't like that at the time but I wasn't sure about it either but I think for me it was because of the name. We can think of a better name of course. Would that be something more acceptable to you?

I took a variation of this in https://github.com/Satellite-im/Warp/pull/466 . I really hope we can agree on this compromise. I still think #453 is the best way forward but I can live with #466 since it does mostly solve the issue, as long as future edits of conversation structs don't add conversation-type-specific fields directly in the structs.