Background

As PIP-98 explained, Pulsar documentation site today is built like an encyclopedia. New users or existing are overwhelmed by it. Without a clear path per role (developer / DevOps / …), they resort to skim-read or read-it-all to fit the pieces of the puzzle together to form a complete picture of the knowledge they need.

New users usually start with the Getting Started section, which today is mainly focused on starting Pulsar on your development computer in several ways, and then test drive it by publishing and consuming messages using the CLI. It lacks a brief intro into subjects and terminology used throughout that section.

New users, approaching learning a subject for the first time, mainly divided into two types of learning methods:

Reading - some people learn by reading all the material on the subject before trying.
Doing - some people learn by “playing” with it - learn by example.

Today, the people that learn by reading are forced to read the entire Pulsar documentation site and fit the pieces together, which is an immense high bar for newcomers. The ones learning by example don’t have any examples in today’s getting started section and are forced to google their way around many sites until they get their answers.

PIP-98, among other things, explained we should have several guides:

Getting Started guide - helping you to get started, customized to your role (developer/operator)
Developer Guide - a customized guide tailored to teach Pulsar to developers.
Operator guide - a customized guide tailored to teach Pulsar to operators.

The people that learn by reading, in the future, will use the Developer or Operator guide, as it will be their “book” for it. The people who learn by doing will use the new getting started section we aim to present here, catering to both developers and operators (also referred to as SREs, Infrastructure, DevOps roles).

This PIP is focused on providing a new structure (table of contents) for the Getting Started Guide.

Goal

Provide a table of content with descriptions per each section for a new getting started guide
The guide will allow:
- New users to “feel” pulsar using the CLI on their development environment
- Developers to learn the basics of Pulsar by providing 2 full working examples of applications (micro-services), which are both short and focused. The examples are two very popular use cases for Pulsar. Each example will have a step-by-step tutorial for building the app, and while doing so, explain key concepts and terminologies about Pulsar and show in real code how to achieve it. Essentially it will showcase Pulsar's key features, which are the most used ones.
- Operators to learn the basics of Pulsar from a DevOps person perspective by deploying a working demo application and Pulsar / BK / ZK to k8s (the popular choice these days). The learning will continue with going through Pulsar / BK dashboards to see and explain key Pulsar concepts, and later will be accompanied by several scenarios demonstrating Pulsar abilities: Replication, rapid broker scale-up, stateless brokers, resiliency, rapid BK scale-up, broker’s auto load balancer, and how after scale up all BK nodes join the incoming data write.

Table of Content

1. Quickstart

In this section, we will let the users, either a developer or DevOps (operator) role, “feel” Pulsar using the command line. First, we’ll present two ways to start Pulsar in stand-alone mode (which includes BK and ZK all within a single process) - by downloading a binary and running it or by issuing a single docker run command. Also present a way to start pulsar in a cluster mode, which includes a process for each component, using Docker Compose. Then we’ll continue by starting a producer, which will produce a message every 5 seconds, and in another terminal window, a consumer displaying those messages. We’ll utilize pulsar shell scripts for that either directly if they downloaded them or use docker exec.
- 1.1 Step 1: Start Pulsar locally
  - 1.1.1. Standalone mode
    
    Here we’ll explain the standalone mode and explain two ways to start pulsar on your development machine. In each section, we’ll show how to view the logs to check if Pulsar started ok.
    - 1.1.1.1 Using release binary
      - 1.1.1.1.1. Downloading
        
        include a very short description of the various folders you unpacked (one paragraph tops)
      - 1.1.1.1.2. Running
    - 1.1.1.2. Using Docker
  - 1.1.2. Cluster mode (Docker Compose)
    
    Here we’ll take the content we have on the site showing how to start a Pulsar Cluster locally using Docker compose
- 1.2. Step 2: Publish and Consume messages using the CLI
  - 1.2.1. Publish messages
    
    Here we will explain how to use the CLI bundled with pulsar to produce a message every 5 seconds. Here we’ll take the opportunity to explain what a topic is briefly.
    
    We’ll use tabs to display code running the CLI since, if you downloaded a binary, it’s one way and if you have used Docker then we’ll issue a docker exec command.
  - 1.2.2 Consume messages
    
    Here we will explain how to use the CLI bundled with Pulsar to consume those messages and display them to the standard output.
    
    Here we will take the opportunity to explain what a subscription is briefly.
  - 1.4. Stopping Pulsar
    
    Contain short steps how to stop pulsar, be it a release binary or docker, or docker compose, using tabs for the different ways.
2. Developer Guide

this will be a full blown guide for developers. For now we’re adding the first section: Getting Started.
- 2.1. Getting Started
  
  This section is focused on developers wanting to have an introduction to Pulsar - basic level - by doing rather than by reading. Some people prefer to learn by doing and “feeling” it in their hands. Developers who prefer to learn by reading will skip and go straight to an Overview section.
  
  We will have 2 tutorials, each featuring a ready-made application (micro-service) showcasing pulsar features and concepts (the most basic ones). Each tutorial will have a link to a repository containing the full example if they just want to see the complete code or just run the example. The tutorial will be a step-by-step explanation of the example app and basically building it in steps.
  
  The tutorials were chosen such that, in my opinion, they are the most popular use case for Pulsar or any other messaging system. In other cases, you will resort to the Tutorials section (explained briefly at the beginning of the PIP), containing more use cases that are less popular.
  
  Since Pulsar SDK is available in several languages, we’ll write the same application first in Java and eventually in all languages Pulsar supports. Each directory in the repository will be dedicated to a single language. Each code snippet will have tabs allowing you to choose which language to see this code snippet for.
  - 2.1.1 Basic Job Queue
    
    In this section, we’ll present a ready-made app that showcases Pulsar's ability to be used as a Job Queue. In our example, it will be a micro-service in charge of video encoding. Each message in the topic represents an encoding task to be done (download the file from S3, encode it, then upload it back to S3).
    
    We’ll explain:
    - Message producing - we’ll implement a simple REST API to receive message encoding tasks and write a message to the topic.
      - What is a topic
      - What is a message producer
      - What is a message
      - What is a Pulsar Client
    - Message consumption - we’ll use a shared subscription to balance the workload across multiple machines.
      - What is a subscription
      - What is a Shared Subscription and how it works
      - What is message acknowledgment
    - Demonstrate scaling by running two instances of this micro-service
  - 2.1.1.1 Prerequisite: running Pulsar in Standalone mode
    
    Link to (1), where we show how to start Pulsar locally.
    
    We prefer that option to Testcontainers since this library doesn’t exist in all languages yet.
  - 2.1.1.2…2.1.1.x :
    - Example high-level overview
    - Link to source code of the full example
    - Step by step building the app
      - with concepts and explanations along the way
    - Summary
  - 2.1.2. Event Sourcing example app
    
    This section will showcase partitioned topics, Failover subscriptions, Key-shared subscriptions, and scaling producers.
    
    The app environment is a beer factory. It has a warehouse micro-service for managing the warehouse. It writes the current stock level as a message into a partitioned topic each time the stock increases or decreases inside the physical warehouse. The key is the beer catalog number, and the message is the stock level in a number.
    
    Another micro-service, Inventory, exposes a REST interface to retrieve current stock levels per beer catalog number. It consumes the stock level messages and persists them to Cassandra (key = beer catalog name).
    
    At first, the rate of changes and the number of beers in the catalog were small. The beer factory owners started with the partitioned topic with one single partition and a Failover subscription since they had to update the inventory levels in Cassandra in order with respect to the same beer catalog number.
    
    Once the beer factory got bigger, more changes were introduced, and more beers were added to the catalog. They were bottlenecked by the update to Cassandra, so they scaled Cassandra, but the bottleneck was now at the consumer, so they wanted to scale out the Inventory micro-service. Hence they switched to a Key-shared subscription to maintain order updates per beer catalog number.
    
    As they got even bigger, the bottleneck was now the broker. They increased the number of partitions and made sure they used a partitioner that writes the same key to the same partition.
    
    This example will include a brief explanation about:
    - Partitioned topic
    - Failover subscription type
    - Key in message
    - Key-shared subscription
    - Scaling consumers
    - Correctly acknowledging key-shared subscription
    - Correctly acknowledging failover subscription
3. Operator Guide
- 3.1. Getting Started
  
  This section is aimed at a person with an operator role (sometimes referred to as Infrastructure / SRE / DevOps), who wants to get started with Pulsar. This role implies different needs compared to the developer getting started. Operators want to try out Pulsar on their k8s cluster (whether mini kube or a test k8s cluster) as opposed to Docker Compose or running a binary. The learning mostly focuses on how to operate it: monitoring, security, and handling failure scenarios.
  
  We’ll start by deploying Pulsar, BK, and ZK using helm charts to k8s and test driving by publishing and consuming messages using the CLI.
  
  We’ll then proceed to deploy a demo application, with one service generating data constantly and writing to Pulsar and the other consuming it and increasing a metric to showcase it. It will be deployed alongside a Prometheus instance for collecting metrics and Grafana with bundled dashboards for Pulsar and the demo app.
  
  Next, we’ll see if the demo app is working and learn a bit about pulsar using the ready-made Pulsar and BK dashboards.
  
  Next, we’ll walk through several scenarios to showcase pulsar features:
  - Increase the number of partitions to the topic and then increase the number of Pulsar pods from 3 to 6 to demonstrate the automatic load balancer.
  - Downscale Pulsar to 1 pod to show it’s stateless.
  - Downscale BK from 3 to 2 pods to show it’s working as long as it has 2 replicas.
  - Upscale BK to 2 to 6 to show all BK are participating equally in writes so in case of large influx, how quickly they can be ready and how its architecture is ready for quick ramp up.

Sidebar

The sidebar will look like this:

Quickstart
- Step 1: Start Pulsar locally
- Step 2: Publish and Consume messages using the CLI
Developer Guide
- Getting Started
  - Basic Job Queue
  - Event Sourcing
Operator Guide
- Getting Started
… The rest of existing sidebar we have today

Links

Discussion: https://lists.apache.org/thread/p8d8ks2ygqnq53oxqczxg2mtpf932wpg Vote: https://lists.apache.org/thread/95p5mn873d6d3lsk5kgfks4n6x07x5pq

Hi Asaf, thanks for proposing this improvement. Generally it looks good. I support the motivation. Structure-wise, I have a couple of questions.

Is it the TOC inside an all-in-one topic or the TOC of the left navigation? In other words, did you plan to provide an all-in-one Getting Started topic covering three subheadings, or three topics for three specific types of readers? It seems to be one topic in your proposal, which may blur the learning path of different roles. For example, as an operator, will they still go through Consume and Produce messages using the CLI or jumpstart from Deploy helm charts to k8s?

What is the content mapping of existing topics? IIUC, the new structure covers the following three topics. What's your plan to do with Docker Compose?

1. Consume and Produce messages using the CLI // containing both `getting-started-standalone` and `getting-started- docker`
2. Get started for Devs
3. Get started for Ops // containing `getting-started-helm`

Thanks for the feedback @momo-jun. From the looks of it, the doc website allows 2 depth level, that means the left pane will have:

Getting Started Guide

Consume and Produce messages using the CLI
Developer Getting Started
Operator Getting Started

Then each section will have the heading it contains per the TOC (depth level 3 will be H1, ...).

Regarding your second question on what to do with existing Getting Started section. Running pulsar locally and Docker is included in the CLI section. running pulsar in K8s is included in Operator Getting Started

"Run a Pulsar cluster locally with Docker Compose" is actually missing How about to tackle this we'll change the TOC to:

1.1 Starting Pulsar locally
- 1.1.1. Standalone mode
  
  Here we’ll explain the standalone mode and explain two ways to start pulsar on your development machine. In each section, we’ll show how to view the logs to check if Pulsar started ok.
  - 1.1.1.1 Using release binary
    - 1.1.1.1.1. Downloading
      - include a very short description of the various folders you unpacked (one paragraph tops)
    - 1.1.1.1.2. Running
  - 1.1.1.2. Using Docker
- 1.1.2. Cluster mode (Docker Compose)
  
  Here we’ll take the content we have on the site showing how to start a Pulsar Cluster locally using Docker compose

Thanks for the further explanation. Adding a branch mode for Docker Compose looks good to me.

Now I only have one concern - the structure of the TOC is not MECE and might be difficult to understand.

Consume and Produce messages using the CLI is named from the user task perspective;
Dev Getting Started and Ops Getting Started are named from the user role perspective.

And logically, Consume and Produce messages using the CLI is part of Dev Getting Started, while operators don't have to go through it - I'm afraid the naming cannot help them get to this point.

Your feedback is much appreciated and straight to the point.

How we name the main headings as below?

Introduction to Pulsar using CLI
Introduction to Pulsar using sample applications
Introduction to Pulsar using operational scenarios

Hi @asafm

Thanks for your awesome proposal. The real-world examples are great additions to the docs!

Issues

While there are some issues in the current proposal:

The learning paths of 3 roles are blurred.

If we put all the topics (as below) into a single Get started, all roles will read them all by sequence, which means a clear learning path is not designed in real.
```
Introduction to Pulsar using CLI
Introduction to Pulsar using sample applications
Introduction to Pulsar using operational scenarios
```
Headings are lengthy.

Main headings are too long.

For example, for Introduction to Pulsar using CLI, actually users care little about the method (whether it's CLI or API) to produce msg. What they want is to try and get a successful result (with whatever the method) in a minimal time. So "method" can be hidden in headings to save space since headings should show the "keypoint" and be "concise" as much as possible.

Solutions

To resolve the issues above, I would suggest that:

Create 3 guides for 3 roles respectively.
Show 3 guides on the doc landing page to provide specific paths for different roles. Users do not need to wander on the Pulsar site or Google around to find suitable docs.
Make 3 guides as subpages of https://pulsar.apache.org/docs since:
- Each role has a "dedicated" container to show all relevant docs.
- Resue docs from https://pulsar.apache.org/docs is possible.

Benefits

This solution:

(1) Highlights the roles and gives them what they need clearly. No missing or duplicates (MVP).

(2) Makes short headings possible.

Besides, for the common docs (e.g., concepts, references) which should be reused, we can link them richly in the 3 guides.

Examples

Starburst docs

Each role has its independent guide (sub-page).

Thanks for the detailed reply!

Regarding the suggestion of moving the getting started for each role into it's own sub-section of a bigger guide (developer, operator):

I was thinking about it. My big plan was indeed to have 2 additional guides, a Developer Guide and an Operator Guide. If we zoom in, for example on the developer role, the two (getting started , developer) serve a completely different purpose:

The getting started guide is aimed at developers who hate learning by reading. They like to learn by doing. This mean, starting with a ready made example, tweak to their needs. Basically, tutorial style. They want "just" the minimum amount of knowledge to get it done. Hence this guide is designed exactly like that: examples, which includes the minimum amount of knowledge you need to understand them.
The developer guide on the other hand, is aimed at people who are the exact opposite: They like to read the book, sometimes start to finish before they even write a single line of code. Those people, like to understand first, do later. Hence the guide it self will really be designed like a book, explaining all the details.

So, when I think about it, in my opinion it's confusing to have in the same guide, two contradictory sections: We'll have a Getting Started section which is basically a tutorial. So the people that like to read first do later, will be confused - "so we're suppose to get started here, but what's going on? I see code here, no no no. I want to understand first. what's going here?". On the other hand, the people like to do first, read later, will not search inside a Developer Guide the getting started. For them, a Developer's Guide is big scary book, filled with way too many details. If you ask them, all they want is tutorials, from the getting started ones, to more complicated ones. So I imagined having a section in the docs named Tutorials, that contains exactly that, grouped by role (developer, operator).

So from that perspective I prefer to have: Getting Started Developer Guide Operator Guide

Regarding second suggestion of having sub-pages of docs. You mean each guide will have their own "doc site"? I think it depends if Docosaurus allow more than 2 depths in the left side bar. I personally like all docs to be in a single location - I don't like to jump around between sites. That's my personal preference.

Regarding

If we put all the topics (as below) into a single Get started, all roles will read them all by sequence, which means a clear learning path is not designed in real.

Why do you think that if the side bar has: Getting Started

Introduction to Pulsar using CLI
Introduction to Pulsar using sample applications
Introduction to Pulsar using operational scenarios

then people will read them one after another? If I'm a developer, I would naturally ignore operational scenarios, right? Why do I care?

I do agree the titles are too lengthy. Maybe we can try:

Getting Started
- Pulsar using CLI "Launch Pulsar locally, and "feel" it using the command line tools"
- Pulsar using sample applications "Learn the 2 most popular use cases by running ready-made sample applications, and learning just what you need to get it working"
- Pulsar using operational scenarios "Run a demo environment including apps, on k8s, and run through common operational scenarios"

Hi @asafm

Thanks for your detailed explanations!

I understand your points, and I'm trying to make the learning path more clear, simple, and direct for each role.

Reasons for designing 3 guides

1. Give prominent directions for users

Suppose that you're at a fork in the road, it's most clear for you to choose one way if the sign indicates the direction.

This is the same for doc users. Whatever the user archetypes ("doing" or "learning") are, the most important thing is they're seeking solutions to resolve issues based on their roles. The roles are signs.

So if we design the doc IA as below, users just need to choose one way based on their roles and finish the left journey. No other stuffs they need to take into consideration. It's simple, clear, and direct.

2. Provide required minimal info for users

If we put all the topics (as below) into a single Get started, all roles will read them all by sequence, which means a clear learning path is not designed in real.

Clarification: "read them all by sequence" means " all roles need to glance over all 3 headings (even though they are just interested in and click one later)" rather than "read them (docs) one after another".

But if we design 3 guides respectively:

Beginners just need to read their Get Started (CLI)
Developers just need to read their Get Started (Job queue + Event sourcing)
Operators just need to read their Get Started (K8S)

In this way, we provide the required info for each role at a minimal amount. Users will like it because it:

Saves some effort and time in reading unneeded info
Improves the efficiency of finding and using the info
Makes users more confident in what they should and need to do

I wholeheartedly agree that the role-based learning paths are a great idea for future iterations of the documentation.

In the short-term, it's a quick win to incrementally update the GSG. I suggest we title the first one "quickstart" and also link to it from GS menu on home page. And then call the others what they are: tutorials. WDYT?

Getting Started

QuickStart "Launch Pulsar locally, and "feel" it using the command line tools"
Tutorials: Sample applications "Learn the 2 most popular use cases by running ready-made sample applications, and learning just what you need to get it working"
Tutorials: Operational scenarios "Run a demo environment including apps, on k8s, and run through common operational scenarios"

Hi @asafm! Thanks for starting this thread.

I reviewed this proposal in two aspects.

Content

For three journeys in the proposal, we have contents for two of them:

Getting started with CLI:
- https://pulsar.apache.org/docs/2.11.x/getting-started-home/
Getting started for sample applications:
- https://pulsar.apache.org/docs/2.11.x/how-to-landing/

The closest pages for getting started with operations are under the "Administration" chapter https://pulsar.apache.org/docs/2.11.x/administration-zk-bk/, while we don't have a portal page or getting started page.

Structure

The "Get Started" chapter is located on the top of the sidebar, and it should be fine.

The "Tutorial" chapter is somewhat hard to find, so we may set up some links or refactor the content and merge it into Get Started chapter.

For reference, grpc-java has a Quickstart page to run the very simple demo and then a "Basic tutorial" page to talk about every basic concept.

The operations getting started content needs to write and we may prepend it as the first item of "Administration" chapter.

Ok. I'll try to combine the suggestion made above.

How about we'll have 3 headings as below:

Quick-start
Developers Guide
- Getting Started
Operators Guide
- Getting Started

The quick start will contain the content I've placed under "Consume and Produce messages using the CLI". The main idea: give any role the ability to "feel" Pulsar locally, using the CLI.

The Developer Guide will be, over time, a comprehensive guide, like a book, to learn Pulsar targeted at Developers. It's Getting Started section will contain what placed under "Developer Getting Started", mainly aimed at people who wish to learn by "hand" as I explained in previous comments and in the PIP.

Same with The Operator Guide, but for Operators (DevOps).

@tisonkun @D-2-Ed I don't like to call it the Getting Started section a tutorial, although it is built as one. People expects a Getting Started section to look like a tutorial. I do think in the future we can have a dedicated Tutorial subsection for each guide.

@tisonkun

For three journeys in the proposal, we have contents for two of them:

Getting started with CLI: https://pulsar.apache.org/docs/2.11.x/getting-started-home/ Getting started for sample applications: https://pulsar.apache.org/docs/2.11.x/how-to-landing/

The CLI journey - I plan to take the content from all three, but I simply structure it differently:

Based on the revised solution I wrote, it will be located under Quick-start. Under it, you'll have two steps: (1) Starting Pulsar Locally (2) Publish and Consume messages using the CLI

The (1) will contain subsections to start it locally using binary downloaded, or docker, both in stand-alone mode, or using docker compose as a complete cluster (including ZK, BK). Once you have a cluster up and running, you can continue to step (2) and use the CLI to publish and consume messages.

Today, it's copied and pasted cross each flavor of starting Pulsar.

So in summary, I plan to re-use the existing content, and mainly restructure it.

Regarding the Developer Guide / Getting Started. You mentioned "https://pulsar.apache.org/docs/2.11.x/how-to-landing/". This gives you a broken up tutorial (not one with steps). Most importantly, it is not using code. Only via command line. The developer getting started section aimed to have a working application. Actually, 2 of those, matching the most popular use cases for pulsar, as detailed in the PIP.

I hoped I answer all of your comments @tisonkun @D-2-Ed @Anonymitaet. Would love to hear your feedback on the suggestion I wrote in the beginning of the comment.

@asafm @D-2-Ed thanks for your explanations!

Record some discussions here for further learning:

The reason for not creating a "Beginner" guide is that some devs might think it refers to "skill levels" rather than "newbies to Pulsar" (even though my intention is this). While the "Quick Start" is suitable for anyone with any skill level to try and "feel" Pulsar.
There are some nuances between the wording "Quick Start" and "Get Started". The former is more lightweight and quicker.

I've updated the PIP according to all comments.

I've updated the PIP according to all comments.

Thanks for your updates @asafm! I believe it's good to go for a vote now.

@asafm Thanks for your proposal.

From the engineering side, the new document structure meets the beginner's reading behavior. I like reading by doing and understanding the key concept in practice. The getting started section works like a book with a real-work example to show which case Pulsar can work for and how it works. Our current website divides the context into several parts and it's a little hard for beginners to link them together in the first reading.

For the discussion of creating clear 3 guides for 3 roles, I think it may not be so important in the Getting Started section. The Getting Started section aims to provide the basic knowledge of Pulsar for all the roles. In fact, those knowledge is the basic part of the 3 roles.

For the concept part, I suggest doing some comparisons between different concepts, such as different subscription types.

This example will include a brief explanation about: Partitioned topic Failover subscription type Key in message Key-shared subscription Scaling consumers Correctly acknowledging key-shared subscription Correctly acknowledging failover subscription

Overall LGTM. I think we can start the vote.

The issue had no activity for 30 days, mark with Stale label.

This will be in motion soon