Team Members:
Road Warrior is a startup poised to revolutionise the travel industry by developing a cutting-edge online trip management platform dedicated to providing travelers with dynamic and manual itinerary management capabilities. This innovative dashboard will empower travelers to effortlessly access and organise all their existing reservations, ensuring a seamless and hassle-free travel experience. Whether users prefer to access it through a web interface or on their mobile devices, this platform will serve as the go-to solution for travelers seeking comprehensive trip management solutions. With this pioneering tool, travelers can look forward to a more organised, convenient, and enjoyable journey, making it the next generation's must-have travel companion. In addition to its user-centric features, this platform will also harness the wealth of data it collects for invaluable reporting purposes. By leveraging this data, travelers will gain insights into their travel patterns, preferences, and spending habits, allowing them to make more informed decisions for future trips. As the platform continues to accumulate user data, it will lay the foundation for a future suggestion engine.
The provided requirements can be found here
To comprehensively address the requirement outlined in the brief, it is crucial to break it down into specific entry points and clearly define the payloads we will receive from each of these entry points. This meticulous approach ensures that we understand and manage the data flow effectively.
The event storming process was employed to identify essential "domain events" within a system, where each event represents an action related to a business entity. It's a crucial initial step as these events configure the central artifact for the system. Event storming meetings start with participants noting domain events, foundational for defining business rules. The team wrote down domain events, each represented on an orange sticky note on a virtual whiteboard. This collaborative approach facilitates a comprehensive understanding and mapping of system events for stakeholders.
Following the identification of domain events, the next step involves pinpointing the commands and users responsible for triggering these events. Commands are actions initiating these events. External actors' commands are explicitly recognised, while some commands originate internally. Post-it notes are arranged to visualise a sequence: actor, command, and event, ensuring a cohesive representation of the system's flow. This step streamlines the understanding of event triggers and user interactions. These commands and domain events are grouped into related aggregates.
In the final step, post-gathering domain events and defining triggering commands, the focus shifts to automation policies. These policies apply to commands lacking external actors, activated upon the completion of specific domain events, signifying communication ties between bounded contexts. By grouping semantically related aggregates, we define bounded contexts. Visualised in a diagram, these boundaries and event-driven connections take shape.
The solution adheres to a boundary analysis that encompasses several key components to ensure its functionality and effectiveness:
Trips and Reservations: At the core of the system lies the ability to manage trips and reservations seamlessly. Users can create, update, and view their travel itineraries, which include flights, hotels, and activity reservations.
Polling Mechanism: To keep information up-to-date, the solution employs a polling mechanism. It regularly checks external sources such as booking platforms for any changes or updates to reservations and synchronises them with the user's itinerary.
Email Webhooks: For real-time communication and updates, the system integrates with email webhooks. Users receive notifications and updates about their reservations directly in their email, ensuring they stay informed and can make timely adjustments to their plans.
Data Analytics: The solution incorporates data analytics to derive insights from user interactions, helping to improve user experiences and provide personalised recommendations based on historical travel data.
Recommendation Engine: Utilising a recommendation engine, the system offers tailored suggestions to users based on their travel history, preferences, and current bookings, enhancing their travel planning and decision-making process.
User Authentication: Security is paramount, and the system includes robust user authentication mechanisms to protect user data and ensure only authorised access to accounts and itineraries.
Third-party Integrations: We integrate with various third-party services and APIs for booking and reservation data, enabling users to seamlessly import and manage their travel information.
By adhering to this boundary analysis, the solution provides a comprehensive and user-centric travel management experience, ensuring efficiency, accuracy, and user satisfaction throughout the journey planning process.
The identified actors and their actions are as follows:
Actor | Actions |
---|---|
Customer (Authenticated) | - Registers on the platform - Logs in the platform - Consent to email forwarding - View upcoming trips - Manage upcoming trips - View trip reservations - Manage trip reservations - Receives notifications regarding upcoming trips - View personalised analytics - Request for help from agency - Share trip details on preferred social media platform - Share trip details with platform - Share trip details with anonymous user |
Customer (Not Authenticated) | - View shared Trip Summary |
System Admin | - Registers on the platform - Logs in the platform - Add multi-lingual translations - View regional analystics |
Delving deeper into the process outlined in Breaking down the Requirements:
User Registration:
Payload: User-provided information such as name, email, username, and password.
User Login:
Payload: User credentials, typically comprising a username/email and password.
Profile Updates:
Trip/Reservation Creation:
Manual Creation
Original Requirement: 'Customers should be able to add, update, or delete existing reservations manually as well'
Payload: User-generated trip data, which includes trip names, descriptions, and associated reservations.
Automated Creation Email
Original Requirement: 'Poll email looking for travel-related emails. Filter and whitelist certain emails'
Entry Point: Automated creation of trips or reservations by listening to incoming emails.
Payload: System-generated trip data, which includes trip names, descriptions, and associated reservations based on email content.
Third-Party Creation
Original Requirement: 'The system must interface with the agency’s existing airline, hotel, and car rental interface system to update travel details'
Entry Point: Online reservation systems or APIs for flights, hotels, and activities.
Payload: Reservation details including dates, times, locations, and confirmation numbers.
Trip/Reservation Deletion:
Manual Delete
Original Requirement: 'Customers should be able to add, update, or delete existing reservations manually as well'
Payload: User-generated trip data, and manually outlined associated reservations based on email content.
Automated Email
Original Requirement: 'Poll email looking for travel-related emails. Filter and whitelist certain emails'
Payload: System-generated data and automatically outlined associated reservations based on email content.
Third-Party Integration
Original Requirement: 'The system must interface with the agency’s existing airline, hotel, and car rental interface system to update travel details'
Payload: System-generated data and automatically outlined associated reservations based on polled content.
Trip/Reservation Updates:
Manual Updates
Original Requirement: 'Customers should be able to add, update, or delete existing reservations manually as well'
Payload: User-generated trip data, and manually outlined associated reservations based on email content.
Automated Email
Original Requirement: 'Poll email looking for travel-related emails. Filter and whitelist certain emails'
Payload: System-generated data and automatically outlined associated reservations based on email content.
Third-Party Integration
Original Requirement: 'The system must interface with the agency’s existing airline, hotel, and car rental interface system to update travel details'
Payload: System-generated data and automatically outlined associated reservations based on polled content.
Itinerary Viewing:
Original Requirement: 'Items in the dashboard should be able to be grouped by trip, and once the trip is complete, the items should automatically be removed from the dashboard'
Trip Sharing:
Original Requirement: 'Users should also be able to share their trip information by interfacing with standard social media sites or allowing targeted people to view your trip'
Payload: Itinerary information, aggregating reservations for a specific trip.
Data Analytics:
User Analytics
Original Requirement: 'Provide end-of-year summary reports for users with a wide range of metrics about their travel usage'
System Analytics
Original Requirement: 'Road Warrior gathers analytical data from users trips for various purposes - travel trends, locations, airline and hotel vendor preferences, cancellation and update frequency, and so on'
Recommendation Engine:
Payload: User data used for analysis, which includes historical travel data, preferences, and behaviour.
Help Engine:
Original Requirement: 'Must integrate with preferred travel agency for quick problem resolution'
Payload: User's help message detailing the information needed.
By breaking down the requirement into these distinct flows with entry points and their associated payloads, we can ensure that we have a clear understanding of where data enters the system and what information is being processed. This structured approach not only aids in the design and development of the system but also lays the foundation for effective data management, security, and the eventual implementation of analytics and recommendation features.
To help us visualise the system we use the actors and components that were outlined in previous sections and drafted the following context diagrams.
The below context diagram provides a high-level introduction to actions that the different User types can perform on the application. The abstractions of the different components (or services) responsible for handling all possible actions triggered by users or external interfaces.
The below actor-to-system boundary diagram expands on the detail provided by the High-level Platform Context Diagram, by describing communication methods between different components (now broken down into services, providers, and external tools) as well as their utilisation of infrastructure components such as databases and messaging/event channels to realise the features offered by the system.
The following section provides a description of the interactions with the neighbouring systems module is subject to (scoped at a Microservice level) and its corresponding interactions with neighbouring systems.
The following are some characteristics that are present for every module and thus are described prior to the service-specific descriptions.
The Authentication Service is responsible for facilitating authentication mechanisms through username and password or social media Single Sign-On (SSO).
Through the use of the provider pattern, the authentication service leverages abstraction to provide a default implementation of the standard authentication operations and then uses the provider pattern to differentiate between the concrete implementation of the internal username/password implementation or external social media SSO APIs.
The Trip Management Service is responsible for allowing users to view and manage trips and reservations.
The Trip Management Service is subscribed to the Event Streaming Infrastructure to listen for incoming events from the Travel Integration and the Email Data Parsing Service for new records and/or changes to existing Trip or Reservation Records.
If changes to locally persisted Trip or Reservation records are made, messages are published through the Event Streaming infrastructure to notify interested parties of said changes.
When use of the 'Share trip to social media' feature is done, a message is published to the Queue Infrastructure, which is being listened to by the Social Media Sharing Service.
The Email Data Parsing Service is responsible for collecting trip and reservation data from Email sources.
As indicated in other areas of the solution documentation, Users will be given instructions on how to create rules for forwarding travel-related emails to the service mailbox. Emails received by the service mailbox are observed by an External Automation Tool (such as Power Automate) and are then published to the Event Streaming infrastructure as JSON objects detailing the email data. The Email Data Parsing service is Subscribed to the Event Streaming Infrastructure so that it can consume and break down email data objects, and then publish them through the same Event Streaming Infrastructure so that the Trip Management Service can ultimately persist them.
The Social Media Sharing Service is responsible for sharing content on social media platforms.
It receives prompts from the Trip Management Service via a Queue infrastructure, and utilises external Social Media providers for Authentication and Sharing to successfully share content to said platforms.
The Travel Integration Service is responsible for collecting trip and reservation data from Travel Agency Integrations.
It subscribes via AMQP (Advanced Message Queuing Protocol) to configure external travel agencies, processes the data, and publishes messages to a queue. Abstraction features are used to cover baseline processing operations and then use the provider pattern to integrate with different external travel agency integration services.
Successfully parsed incoming records are then subsequently published to the Event Streaming Infrastructure for further processing and local persistence by the Trip Management Service.
The Notifications Service is responsible for pushing notifications to the Public-facing applications.
It subscribes via AMQP to the Event Streaming Infrastructure, listening in to messages concerning new or adjusted Trips/Reservations coming in from the Trip Management Service, and then subsequently raises notifications to active users on the Web or Mobile users with the PWA installed via a Publish/Subscribe mechanism.
The Reporting & Analytics Service is used to generate reports and store data in a format suitable for reporting within the data warehouse.
The service is subscribed to the Event Streaming Infrastructure for updates stemming from the Trip Management Service so that changes can be propagated to the data warehouse (and stored in an unstructured way). It uses restful APIs to communicate with an external reporting & analytics service (such as PowerBI), to generate and embed reports and statistics. The external reporting & analytics service is configured to read from the data warehouse, and can also be consumed via an External Tool (such as PowerBI Desktop), for system admins to access reporting for the entire platform.
Mock-ups are essential in the development process of the solution since it allows the team to visualise and conceptualise the idea. It also allows us to take a user-centered approach that aligns with the requirements.
The first approach for prototyping is the traditional pen and paper with the results being show cased hereunder.
After the manual prototyping, the next flow was to do a Figma design of the solution with the results being shown hereunder.
This section takes into consideration how the architecture is to be split using the Developer to Architect Architecture Resource. This is intended to outline key architectural attributes we deem essential for a successful system implementation.
Preferred | Characteristics | Reason |
---|---|---|
[X] | Scalability | The system needs to be highly scalable since it needs to grow to accommodate increased demand and workload. This scalability is essential in the context of the solution, as travel-related services often experience fluctuations in user traffic, especially during peak seasons or special events. Whether it's a sudden surge in users making reservations or an uptick in concurrent users accessing their itineraries, the system can efficiently allocate additional resources to handle the increased load. This scalability ensures that users experience uninterrupted service and swift response times, regardless of the system's level of demand. |
[X] | Elasticity | Elasticity takes the concept of scalability a step further by not only allowing the system to grow but also contract when demand decreases. The solution needs to be designed with elasticity in mind, enabling it to automatically adjust its resource allocation based on real-time demand. For instance, during periods of lower user activity, the system can scale down to conserve resources, reducing operational costs. Conversely, when demand surges, it can quickly scale up to meet the increased load. This elasticity ensures cost-efficiency and optimal resource utilisation, making the solution adaptable and financially sustainable over time. |
[] | Data Integrity & Consistency | Ensuring the integrity and consistency of data is paramount in this system. There is a need to implement robust data validation mechanisms, error-handling processes, and transaction management to prevent data corruption or discrepancies. By maintaining data integrity and consistency, we guarantee that users can rely on accurate information throughout their travel planning and management processes. |
[] | Abstraction | Abstraction is a foundational element of the system's architecture. It allows us to shield users and developers from unnecessary complexities by presenting simplified and user-friendly interfaces. By abstracting the underlying technical intricacies, we enhance usability and reduce the complexities of integrating future applications of similar types of existing implementations. |
[] | Availability | The solution has to be built with high availability in mind due to the requirement of a maximum of 5 minutes of downtime per month. There is a need to employ redundancy, failover mechanisms, and disaster recovery strategies to minimise downtime and ensure that users can access their travel information 24/7. Availability is critical in the travel industry, where users may require access to their itineraries and bookings at any time. |
[X] | Performance | Performance optimisation is a key focus in the architectural design. Therefore, the system needs to employ efficient algorithms, caching mechanisms, and load balancing to deliver fast response times and smooth user interactions. Whether users are viewing their itineraries or receiving real-time recommendations, the system will need to consistently deliver high-performance results. |
[] | Interoperability | Interoperability to facilitate seamless communication with external systems and services. This needs to adhere to industry standards and implement standardised data exchange protocols to ensure that our platform can integrate with various third-party providers, booking systems, and travel-related services. This interoperability enhances the user experience by offering comprehensive access to travel-related resources. |
Characteristics | Reason |
---|---|
Feasibility / Cost | This implicit characteristic comes as a result of the start-up nature of the client and revolves around the financial aspects of a software project. Feasibility analysis assesses whether the project is financially viable and if the expected benefits outweigh the costs. It also considers factors like budget constraints, resource availability, and potential return on investment. Addressing this may require some early-on concessions when designing MVPs which will eventually be made less cost effective and more efficient once the solution becomes self-sustaining. |
Maintainability | Maintainability refers to the software's ease of modification, enhancement, and long-term sustainability. Implicitly, it underscores the importance of writing clean, modular, and well-documented code. It involves practices such as code refactoring, version control, and adherence to coding standards such as abstraction. A maintainable software system is more cost-effective to update and extend over time, reducing the risk of technical debt and ensuring that the software remains adaptable to changing requirements. |
Observability | Observability is focused on a software system's ability to provide insights into its behavior, performance, and issues. It involves implementing logging, monitoring, and error-tracking mechanisms. Observability allows developers and operators to gain visibility into the system's internal workings, making it easier to diagnose and resolve problems, optimise performance, and ensure that the software meets its operational objectives. Implicitly, observability emphasises proactive system health management and continuous improvement through data-driven insights. |
Ensuring availability in different global regions is a complex yet critical aspect of modern digital services. It involves deploying redundant infrastructure, global distribution of data, and leveraging Content Delivery Networks (CDNs) to minimise latency and downtime. Factors such as geographical diversity, local regulations, and varying network conditions must be considered. Achieving high availability means that users, regardless of their location, can access services reliably and consistently. This global approach to availability not only enhances user experiences but also strengthens disaster recovery capabilities, ensuring that services remain resilient even in the face of regional disruptions.
Based on the Characteristics the chosen architecture is based on microservices, event-driven and space-based architecture.
The system will adopt a Microservices Architecture to promote modularity and scalability. Different components of the system, such as user management, reservation handling, and recommendation generation, will be developed as independent microservices. Each microservice will have its own database and will communicate with others through the event bus. This approach allows for agile development, easy maintenance, and the ability to scale specific services independently to meet varying demands. For example, during peak travel booking seasons, we can allocate more resources to the reservation microservice while keeping other services unaffected.
While there might be a performance trade-off associated with microservices, it's feasible to mitigate this drawback by incorporating strategies such as caching, scaling, and database sharding.
ADR 4 - Microservice Architecture
Event-driven architecture will be integral to the system's real-time capabilities. Events, such as user actions (booking a flight, changing an itinerary) or external updates (flight delays, hotel availability), will trigger asynchronous messages that various components can subscribe to and act upon. For instance, when a user adds a new reservation, it generates an event that updates the user's itinerary and triggers the recommendation engine to suggest relevant activities or accommodations. This decoupled and event-driven approach ensures that the system remains responsive, scalable, and capable of handling real-time data updates seamlessly.
ADR 5 - Event Driven Architecture
Space-based architecture will be employed for managing distributed, in-memory data caches and ensuring high availability and low-latency access to frequently accessed data. This architecture allows us to store and retrieve data in a distributed and fault-tolerant manner, which is crucial for a system handling real-time travel information. For example, we can use a space-based architecture for caching frequently accessed itinerary data, ensuring that users can quickly access their travel plans regardless of the data's physical location. This architecture also supports data consistency and synchronization across multiple regions for enhanced availability and performance.
ADR 14 - Space-Based Architecture
This leads to the following high-level solution approach
The business plan revolves around strategic partnerships, software development, and infrastructure resources to provide a user-friendly platform with personalised recommendations for travelers. This involves ongoing investments in personnel, software development tools, marketing, and customer support. The revenue streams are diverse, encompassing subscription models, future transaction fees, advertising partnerships, and premium features, which help offset operational costs and drive profitability. Road Warrior is committed to enhancing user experience and fostering strong customer relationships as part of its ongoing strategy, this ensures a sustainable and successful business.
The platform roadmap that has been drafted takes into consideration the infancy of the enterprise and has therefore been designed in such a way that focuses on introducing streams of revenue as soon as possible to cover necessary funding for the undertaking of this project.
Four named MVPs are being proposed:
MVP 1: Road Warrior Soft-Launch - As the namesake implies, this MVP will involve launching the product with just the essential, barebones features, suitable enough to introduce the potential of the product to the market. The majority of requirements specified in the initial spec are covered completely, with other less critical requirements being delivered in part or planned for launch in a future MVP. This MVP will help establish 'Road Warrior' in the traveling organisation app market and potentially even introduce investment opportunities. The inclusion of lightweight advertisements in the barebones version of the application will also introduce a new and immediate stream of revenue, scaling based on the number of users (as will running costs, as a matter of fact).
MVP 2: Shared Dashboards and Integrations - Introduce features that support collaboration/sharing among authenticated users, expanding the social elements of the application. At this point, all baseline requirements from the original specification barring reporting & analytics are implemented to some degree. Expanding integrations with users' mailboxes, and additional booking agencies will also increase traffic on the application, increasing traffic and introducing new opportunities for further investments.
MVP 3: Subscription Model, Analytics & Reporting - By the time that the development and planned delivery of MVP3 is underway, the project should have established an audience (this will be assisted through relevant marketing efforts). A larger audience in addition to increased features (and complexity of said features), means that computing costs will increase just as well. Advertisements will cover a portion of these running costs, however, to offer a more seamless experience as well as more advanced (resource-intensive) features, a subscription model will be released.
MVP 4: Expand Covered Services - This is the last "planned" MVP for the product. Here, the platform will undergo horizontal diversification in the services and data it offers by covering attractions and taxis.
MVP N+: At this point in time, the project will be in maintenance mode. Bugfixes and performance adjustments will be issued as needed, while new features, covered services and booking agency integrations will be incrementally increased based on community feedback
The following section outlines the different components which make up the architecture. While this section outlines concrete implementations to a specific cloud provider the solution will still be abstracted in a way that we'll create a vendor-agnostic solution without the risk of a vendor lock-in.
Kubernetes plays a pivotal role in load-balancing the core services of the system, ensuring that they remain highly available, scalable, and responsive to user requests. This is done by:
Simplifying service deployment of core services as containers within a cluster. Each service is encapsulated in a container, making it easy to manage and scale independently.
Ability to use Replica Sets to maintain a specified number of replicas (containers) for each core service. This ensures that even if one container fails, a new one is automatically spawned, maintaining the desired level of service availability.
The usage of the built-in service discovery mechanisms which enables load balancing to ensure that incoming requests are distributed evenly across the available service replicas.
Management of entry points through the usage of ingress controllers for external traffic into the cluster. These can route incoming requests to the appropriate core service based on defined rules, such as domain names or URL paths.
Kubernetes enables automatic scaling of core services based on predefined metrics such as CPU utilisation. When traffic increases, Kubernetes can dynamically spin up additional service replicas to handle the load, ensuring optimal performance.
The container registry is an essential infrastructure component for Kubernetes. It centralises image management, version control, and distribution, promoting efficient and secure software delivery.
The event bus allows different parts of the solution to exchange information in a loosely coupled manner. It enables components or services to publish events and subscribe to events of interest. This approach was chosen since it is widely used in event-driven architectures, microservices, and distributed systems to facilitate seamless communication and data exchange among various system elements.
ADR 4 - Microservice Architecture
ADR 5 - Event Driven Architecture
ADR 14 - Space-Based Architecture
Given that the solution will be listening to a Road Warrior's owned mailbox it will be possible for the solution to implement RPA by having a 'when email received' trigger on the mailbox. This action would then allow the core services to work on the parsed email data.
ADR 8 - Polling vs Webhooks with Email Forwarding Rule
Given that the system needs to be performant Next.js was chosen due to its support for Server-Side Rendering (SSR) and Progressive Web App (PWA) capabilities.
SSR offers several advantages namely improved SEO and faster initial page load which are crucial for the app to obtain adoption with the user base.
PWAs offer features that allow the application to be much more accessible due to offline support which allows for browsing in areas of limited internet, app experience and packing which facilitates publishing to mobile stores, caching strategies which allow the storage of assets and data on the client's device to ensure fast load times on subsequent visits.
CosmosDB is the backbone of the app's data management strategy. With its globally distributed, multi-model database service, CosmosDB enables us to seamlessly handle vast amounts of data, provide low-latency access to users worldwide, and ensure high availability and scalability. Its support for various data models, including document, key-value, graph, and column family, offers the flexibility needed to store and query diverse types of data efficiently. CosmosDB's built-in global distribution, automatic scaling, and robust consistency options align perfectly with the app's requirements for data resilience, real-time updates, and responsive performance. It's the foundational layer that empowers the app to deliver a seamless and data-rich user experience.
ADR 3 - Distributed Databasesand Redis for Global Data Distribution
ADR 9 - Choice of "Eventual Consistency" for Distributed Databases
ADR 12 - Distribution of Data Globally
Redis plays a pivotal role in enhancing the speed and efficiency of the app. As an in-memory data store, Redis excels at caching frequently accessed data, reducing database load, and significantly improving response times for users. Its support for data structures like strings, sets, and hashes makes it versatile for various application needs, such as session management, real-time analytics, and queuing. With Redis, the app can deliver fast data retrieval and processing, ensuring a snappy and highly responsive user experience. It's a key component that enhances the overall performance and scalability of the application.
ADR 3 - Distributed Databases and Redis for Global Data Distribution
ADR 13 - Usage of Serverless Functions with Redis Over APIs
Serverless functions enable the application to execute code in a highly efficient and cost-effective manner. By leveraging serverless computing platforms like Azure Functions, the solution will be able to run code in response to events or API requests without the need to manage servers or infrastructure. This approach enables rapid development, automatic scaling, and optimal resource utilization. These functions provide the solution with the agility and scalability needed to deliver a seamless and responsive user experience while minimising operational overhead and costs.
ADR 13 - Usage of Serverless Functions with Redis Over APIs
Load balancing is a critical component of the app's infrastructure. This is achieved by leveraging Azure's suite of services to ensure optimal performance and availability.
Azure Traffic Manager intelligently distributes user traffic across multiple data centers based on pre-configured geographical rules.
Azure CDN accelerates content delivery by caching and serving static assets from edge locations worldwide, reducing latency for users.
Azure Front Door acts as a global entry point, combining security and load balancing to direct traffic to the nearest available backend service.
ADR 10 - Load Balancing Core Services
ADR 12 - Distribution of Data Globally
In order to improve security, reliability, and performance for the main cluster and the geographically dispersed API endpoints, the solution will employ the usage of private links. This is a service that enables secure and private communication between the application and services, like databases, storage, and other resources, without traversing the public internet.
This approach will be utilised to improve:
Security since this approach ensures that data transfer between the application and cloud services remains within the private network, isolated from the public internet. This significantly reduces the attack surface and minimises the risk of unauthorised access or data breaches.
Data Privacy through the establishment of a private, dedicated connection, ensuring that sensitive data does not leave the private network during transit. This is critical for maintaining the privacy and confidentiality of user information, travel itineraries, and other travel-related data.
Improved Performance by eliminating the need for data to traverse the public internet.
Overall this approach is expected to create an isolated environment for the application's backbone thereby reducing exposure to external threats and ensuring that our application's dependencies are accessible only through a private, secure channel.
Azure Synapse serves as the backbone of the app's data analytics and warehousing capabilities. With its powerful data integration, transformation, and analytics tools, Azure Synapse enables the solution to harness the full potential of the collected data. It seamlessly integrates with various data sources and provides a unified platform for data storage, processing, and visualization. Whether it's running complex analytical queries, creating data pipelines, or generating actionable insights, Azure Synapse empowers the solution to make data-driven decisions and deliver a richer, more informed user experience.
Having gone over the MVP Timeline Proposal and identified the core components that will make the system in Identifying Architectural Quanta we will start to outline how the solution will physically be built vis-a-vis the MVP roll out and the expected cost at each phase of the architecture. Azure has been used as an example platform to reference specific managed services and calculate a baseline cost. As previously mentioned the system is to be built in an abstract way that allows all managed services to be swapped out to any other Cloud managed services. Azure will however be used for us to be able to come up with a base price for the platform.
Throughout the technical build-up, we constantly kept in mind the following requirements:
Given that Road Warrior is a start-up it is critical to ensure a cost-effective MVP rollout that does not cripple the start-up. Therefore, we will concentrate on delivering a lean and focused version of our product. Utilising cloud services, and taking a scale-as-you-go approach, we will optimise development costs. Our design will be minimalistic yet functional, and we will follow an agile development approach for rapid iteration based on user feedback. We'll continuously monitor costs and performance to make data-driven decisions. This approach will enable us to validate our concept while effectively managing our startup's financial resources.
To this end the first MVP is a bare-bones deployment consisting of:
While this is not the most performant for the forecasted user base, we do not expect a huge amount of traffic in the initial rollout either. Therefore, we foresee this to be viable in the beginning. The below diagram depicts the infrastructure set up at this point
Service | Specifications | Cost |
---|---|---|
Azure Kubernetes Service (AKS) | 1 Linux D4a v4 Node (no reserved instances) with S4 - 32GB of OS Disk | $238.06 |
Azure Container Registry | Standard | $20.00 |
Azure Cosmos DB | Serverless with 200GB of storage | $50.25 |
Event Grid | Standard - Event Grid Namespace (Assuming up to 5 million monthly operations) | $1.80 |
Storage Account | General Purpose v2 | $23.88 |
App Service | Premium V2 (P1V2) to be used by API and PWA | $146.00 |
Azure DNS | Zone 1 Public DNS | $0.90 |
IP Addresses | Global ARM 1 Static IP | $16.06 |
$496.95 |
Our proposed initial commitment to Road Warriors is $496.95 per month. This infrastructure is expected to handle a good workload but not the expected 2 million monthly active users. However, we do not expect to have this workload in the initial phases, notwithstanding the expectations if the system metrics show strain it will be possible for us to alleviate the cloud's potential and scale accordingly.
This iteration will continue on building on MVP 1 and start to add core functionality through integrations with third-party vendors and users' mailboxes. This means that apart from further alleviating the usage of our existing Event Grid we also need to start utilising RPA for the when mail received trigger. It would also be expected that the initial load from MVP 1 will now be strained and therefore the infrastructure will be scaled up. At this moment we do not believe that committing to reserved instances will be beneficial since the system would still be undergoing rapid growth.
The MVP 2 iteration will see the following changes:
Service | Specifications | Cost |
---|---|---|
Azure Kubernetes Service (AKS) | 1 Linux D8a v4 Node (no reserved instances) with S4 - 32GB of OS Disk | $401.58 |
Azure Container Registry | Standard | $20.00 |
Azure Cosmos DB | Autoscale Provisioned Throughput with 200GB of storage | $137.60 |
Event Grid | Standard - Event Grid Namespace (Assuming up to 10 million monthly operations) | $5.40 |
Storage Account | General Purpose v2 | $23.88 |
App Service | Premium V2 (P2V2) to be used by API and PWA | $292.00 |
Azure DNS | Zone 1 Public DNS | $0.90 |
IP Addresses | Global ARM 1 Static IP | $16.06 |
Power Automate | 1 Standard User | $15.00 |
$912.42 |
The cost at this stage is expected to go up to $912.42 per month. While this is almost double the cost of MVP 1 it can be noted that the core services' Cluster, Database, and front-facing App Service have also been significantly upgraded. These upgrades are due to the additional load that the third-party integration will start introducing and with the expectations that the system would have started to generate traction and more users are onboarding.
This iteration focuses mainly on the Analytics and Reporting aspect of the system which will be expected to feature greatly in the application's forecasted growth. At this point, we are also assuming that the amount of active users per week is starting to approach the 2 million mark. Therefore, this MVP iteration will see the following changes:
Service | Specifications | Cost |
---|---|---|
Azure Kubernetes Service (AKS) | 2 Linux D8a v4 Node (no reserved instances) with S4 - 32GB of OS Disk | $728.62 |
Azure Container Registry | Standard | $20.00 |
Azure Cosmos DB | Autoscale Provisioned Throughput with 200GB of storage | $137.60 |
Event Grid | Standard - Event Grid Namespace (Assuming up to 20 million monthly operations) | $11.40 |
Storage Account | General Purpose v2 | $23.88 |
App Service | Premium V2 (P2V2) to be used by API and PWA | $292.00 |
Azure DNS | Zone 1 Public DNS | $0.90 |
IP Addresses | Global ARM 1 Static IP | $16.06 |
Power Automate | 1 Standard User | $15.00 |
Azure Synapse | Compute Optimised Gen2 with 100 DWU Blocks, 10 hour daily commitment and a 3 year reserve instance | $397.30 |
Apache Spark Pool | Small Memory Optimised (4 vCores with 32 GB) | $166.92 |
Power BI | 1 Premium User | $20.00 |
$1,829.68 |
The cost has once more doubled from MVP 2 to MVP 3 with the new forecasted cost being at $1,829.68 per month. However, this iteration, apart from more upgrades to the cluster starts setting the foundation of the analytics engine. While this is costly it is also an essential part of the application and has therefore started to feature.
The final main iteration will consist of geographical expansion through the replication of Cosmos DB via geographical distribution and the usage of better load-balancing techniques. This iteration will also used to gather usage metric data to commit to reserved instances for 3 years to bring down the cost of infrastructure. While this means that Road Warriors is committed to 3 years with the same minimum cluster size we are assuming that the start-up has now stabilised and has prospects of more growth going forward. To this end, MVP 4 will focus on geographical distribution and load-balancing by adding:
This leads to the below final overall architecture
Service | Specifications | Cost |
---|---|---|
Azure Kubernetes Service (AKS) | 2 Linux E16-8as v5 Node (3 year reserved instances) with S4 - 32GB of OS Disk | $774.02 |
Azure Container Registry | Standard | $20.00 |
Azure Cosmos DB | Autoscale Provisioned Throughput with 200GB of storage with availability in West Europe, East US, East Asia, and Southeast Asia and a maximum of 2000 Requests per second | $500.40 |
Event Grid | Standard - Event Grid Namespace (Assuming up to 50 million monthly events) | $29.40 |
Storage Account | General Purpose v2 | $23.88 |
App Service | Premium V2 (P1V2) to be used by PWA in 4 regions | $584.00 |
Serverless Functions | Consumption assuming up to 100,000,000 requests per month in 4 regions | $157.60 |
Traffic Manager | 10,000,000 DNS queries per month | $5.40 |
Azure CDN | Static Data in 4 zones | $ 3.66 |
Azure Front Door | Entry point for PWA | $35.51 |
Azure Redis Cache | Standard C2 Cache in 4 regions | $654.08 |
Azure DNS | Zone 1 Public DNS | $0.90 |
IP Addresses | Global ARM 1 Static IP | $16.06 |
Power Automate | 1 Standard User | $15.00 |
Azure Synapse | Compute Optimised Gen2 with 100 DWU Blocks and a 3 year reserve instance | $397.30 |
Apache Spark Pool | Small Memory Optimised (4 vCores with 32 GB) | $166.92 |
Power BI | 1 Premium User | $20.00 |
$3,404.13 |
While once more we are seeing a steep cost when compared to MVP 3 with the new monthly cost going to $3,404.13 per month we have managed to make our application more accessible and responsive in different parts of the globe. This is critical since the nature of the application makes it required to be performant globally since even if the user base is focused in a specific country, the same users will largely be consuming the contents of the application while actively on a trip.
The final cost of $3,404.13 per month should not be taken as a fixed number since we would continuously continue to monitor the application to see if we need to scale up or down. Such scaling will have an effect on the cost with respect to the scaling direction.
The following are some software engineering practices that will be adhered to during the undertaking of the project:
A design pattern that is used to abstract the creation of objects or services. This pattern decouples client code from the specific implementation and is commonly used in dependency injection and inversion of control.
This pattern will be used thoroughly within the solution in areas where common code can be used to cover features that are fed inputs from different sources that need to undergo the same business logic, as is the case with supporting different SSO authentication providers, different travel agency integrations, and so on.
Given the usage of domain boundary analysis in the event storming phase it comes naturally that the solution will adopt DDD with CQRS as an engineering pattern. This combination allows the building of complex, scalable, and maintainable software systems. All of these are identified as key architectural characteristics that the solution requires as key pillars of its foundation.
This methodology empowers the creation of a shared understanding of the complex travel management domain and the crafting of a software solution that truly aligns with the real-world intricacies of travel, reservations, and user interactions. In this context, DDD is not just an engineering strategy but allows the creation of a user-centric travel management platform.
By combining CQRS with DDD, we achieve a robust and flexible architecture. The write side of CQRS aligns well with DDD's focus on modeling the domain, encapsulating business logic, and enforcing consistency.
The read side of CQRS complements DDD by providing optimised query paths, making it easier to retrieve data in a format that matches the user's needs.
Events can be leveraged to communicate changes between bounded contexts, facilitating loose coupling and flexibility in our application's architecture.
ADR 15 - DDD with CQRS Pattern
Deployment pipelines refer to an automated series of steps for deploying changes to the product. This is in line with the chosen idea to produce MVPs that increment on top of each other with new features. It helps ensure consistent and reliable software delivery, void of human errors caused by mistakes in the deployment process.
This practice is supplemented by CI/CD (Continuous Integration/Continuous Deployment):
The combination of Deployment Pipelines and CI/CD practices promotes rapid development, testing, and deployment of software.
The SOLID principles are a series of guidelines for writing maintainable and extensible code. When followed, they help improve code design, readability, and maintainability.
Small, isolated tests that validate the behavior of individual code units (e.g., modifications of trips/reservations). Unit tests help ensure that each piece of code works correctly in isolation and contribute towards consistent code quality assurance.
ADR 2 - Choosing REST and CQRS over GraphQL
ADR 3 - Distributed Databases and Redis for Global Data Distribution
ADR 4 - Microservice Architecture
ADR 5 - Event Driven Architecture
ADR 8 - Polling vs Webhooks with Email Forwarding Rule
ADR 9 - Choice of "Eventual Consistency" for Distributed Databases
ADR 10 - Load Balancing of Core Services
ADR 11 - Segregation of Core Services and Reader APIs
ADR 12 - Distribution of Data Globally
ADR 13 - Usage of Serverless Functions with Redis Over APIs
ADR 14 - Space-Based Architecture
ADR 15 - DDD with CQRS Pattern
Fundamentals of Software Architecture
Software Architecture Patterns
Software Architecture: The Hard Parts
Developer to Architect Architecture Resources
Strategyzer Business Model Canvas