2. 设计: data model, API

YuezhenQin commented 6 months ago

将需求划分为若干个小的、独立的功能模块。

若一个类依赖的类过多，不符合高内聚、低耦合的代码设计思想，则对该类进行拆分。
定义核心类、类内部的属性和方法
定义各个类之间的交互关系
将全部类组装起来并提供执行入口

YuezhenQin commented 6 months ago

1. 划分职责并识别有哪些核心类

类是对现实世界中的事物的建模。但是并不是每个需求都能映射到现实世界，也并不是每个类都能与现实世界中的事物一一对应。对于一些抽象的概念，我们是无法通过映射现实世界中的事物的方式来定义类的。

YuezhenQin commented 6 months ago

2. 定义各个类中的属性和方法

YuezhenQin commented 6 months ago

3. 定义类之间的交互关系

类之间存在哪些交互关系呢？UML 中定义了类之间的 6 中关系：泛化（子类继承父类）、实现（接口与实现类）、聚合（A包含B，A对象和B对象的生命周期相互独立，B由构造方法外部传入）、组合（A对象包含B对象，B依赖于A不可独立存在，B于构造方法内部创建出来）、关联（A包含B作为成员变量）和依赖（A包含B作为成员变量、成员方法的返回值、参数或局部变量）。

YuezhenQin commented 6 months ago

4. 将类组装起来并提供执行入口

这个入口可以是一个main()，也可以是一组提供给外部调用的 API。通过这个入口，我们能够执行代码。

YuezhenQin commented 4 months ago

High-Level API Design We’ll likely use a RESTful API style for broader compatibility. Here’s a breakdown of possible endpoints:

1.Send a Message (POST /messages): The request body includes the recipient’s ID and message content. A successful response (200) returns a unique message identifier. Error codes (400, 500) handle missing parameters or server issues. 2.Check for New Messages (GET /messages): The response is either a 200 with an array of unread messages or a 204 if there are none. 3.Get a Specific Message (GET /messages/:messageId): Returns a specific message (200) or a 404 if not found. 4.Mark Message as Read (PUT or PATCH /messages/:messageId): A successful response (200) confirms the change, while a 404 indicates the message wasn’t found.

YuezhenQin commented 4 months ago

High-Level System Design Mobile App: The primary interface for users will be the mobile app (iOS, Android). This app handles sending and receiving messages, contact management, and conversations.

Load Balancer: To handle incoming requests efficiently, we’ll use a load balancer to distribute traffic across multiple servers. This improves our application’s reliability.

API Servers: All requests will go to the API servers, which handle the RESTful APIs we outlined earlier, managing messaging logic. API servers themselves could be stateless; this way, we can scale out horizontally (adding more servers) as traffic grows.

WebSocket Connections: WhatsApp-like apps heavily rely on WebSockets for real-time communication. The chat servers will maintain persistent WebSocket connections with the mobile apps. When a message arrives, it can be instantly pushed to the recipient’s device.

Message Distributor: Next, we will have a Message Distributor service, and the main purpose of this service is to decouple API servers from direct database writes, which is especially important for handling the high write volume.

A message queue, such as Kafka or RabbitMQ, is a great fit here. Here’s how it will work:

The API server receives a “Send Message” POST request. It places the message on the queue and promptly returns a success/acknowledgment to the client. Separate worker processes asynchronously read from the queue and write messages into the database. Database (NoSQL): We agreed that eventual consistency is acceptable, and this makes NoSQL a scalable choice for the high message volume.

Let’s consider two strong options:

Cassandra: Wide-column store known for scalability, high availability, and write performance. Especially good if we anticipate high write volume with simpler read patterns (message fetch by ID mainly). DynamoDB: Fully managed key-value and document database offered by AWS. It is advantageous if we want a minimal-maintenance database solution that scales easily.

Sharding and Partitioning: It’s crucial to shard (horizontally partition) the data since no single database can handle our 1.5 Petabyte storage needs.

But how would we shard and partition these data, and how do these API servers know where to request that data from?

We could partition based on userId. All messages involving a user will reside in the same shard/partition. And our API servers have two potential ways to locate data: Consistent Hashing Ring: Data location can be determined based on the partition key, allowing API servers to route requests to the correct database shard directly. Metadata Service: A separate service keeps a mapping of partition keys to shard locations. API servers query this service first, then make the database call.

Conclusion and current system bottlenecks This outlines the primary architecture for a WhatsApp-like application. Now, let’s examine the potential bottlenecks in our current system and areas for improvement:

Database Writes: High write volume is a potential bottleneck. Sharding, message queues, and optimized database choices are crucial. End-to-End Encryption: The WhatsApp model heavily emphasizes security. Implementing end-to-end encryption would be a crucial discussion. Group Chats: This feature brings additional complexity to message routing and storage. Media Handling: We can implement a system for handling image and video uploads, using compression here and multiple storage sizes for thumbnails.

YuezhenQin / javaweb