WardPearce / paaster

Paaster is a secure and user-friendly pastebin application that prioritizes privacy and simplicity. With end-to-end encryption and paste history, Paaster ensures that your pasted code remains confidential and accessible.
https://paaster.io
GNU Affero General Public License v3.0
447 stars 17 forks source link

Improve documentation on S3 storage and overall architecture #191

Closed shalak closed 1 year ago

shalak commented 1 year ago

Could someone elaborate on the role and purpose of S3 storage in this system?

We also have 'paaster_mongo' in use. Is it necessary to have these two types of data storage to run Paaster? What are they used for?

WardPearce commented 1 year ago

I'm open to conversations & changes around Paaster's architecture.

S3 is used for object storage, so large blocks of data (in this case encrypted pastes) can be stored and delivered effectively via a CDN. Benefits of using S3 includes data chunking, content delivery & redundancy.

Mongo DB is basically used to provide metadata of an encrypted paste. This includes paste expiration, access code storage, IV storage & owner secrets.

Mongo DB isn't designed for large data storage and would be unpractical too, Mongo DB storage isn't cheap, limited to 16 MB per document and would require loading an entire encrypted paste into memory just to access other metadata around the paste.

For example if someone uploaded a 500MBs paste, this would have to be read entirely into memory and any subsequent queries to that document would load 500MBs into memory.

shalak commented 1 year ago

I see. It makes sense, would be nice to have this described in README.md. I've taken a liberty of creating a PR.

Did you have any success with deploying Paaster with some self-hosted s3 provider? I'm trying to make the following work:

version: '3'
services:
  paaster-backend:
    container_name: paaster-backend
    image: wardpearce/paaster-backend:latest
    restart: unless-stopped
    ports:
      - "8888:80"
    environment:
      paaster_max_paste_size: 1049000
      paaster_max_iv_size: 42
      paaster_open_api: '{"title": "paaster.io", "version": "2.0.0"}'
      paaster_mongo: '{"host": "paaster-mongodb", "port": 27017, "collection": "paasterv2"}'
      paaster_s3: '{"endpoint_url": "http://paaster-minio:9000","secret_access_key": "miniosecretkey","access_key_id": "minioaccesskey","region_name": "us-east-1","bucket": "paaster","folder": "pastes","download_url": "http://paaster-minio:9000/paaster"}'
      paaster_proxy_urls: '{"frontend": "https://paaster.mydomain.net", "backend": "http://paaster-backend.mydomain.net"}'
    depends_on:
      - paaster-mongodb
      - paaster-minio
    labels:
      - "traefik.enable=true"

  paaster:
    container_name: paaster
    image: wardpearce/paaster-frontend:latest
    restart: unless-stopped
    environment:
      VITE_NAME: "paaster.mydomain.net"
      VITE_API_URL: "https://paaster-backend.mydomain.net"
    ports:
      - "8889:80"
    labels:
      - "traefik.enable=true"

  paaster-mongodb:
    image: mongo:4.4
    container_name: paaster-mongodb
    restart: unless-stopped
    environment:
      MONGODB_DATA_DIR: /data/db
      MONDODB_LOG_DIR: /dev/null
    volumes:
      - ./paaster_data:/data/db

  paaster-minio:
    image: minio/minio
    container_name: paaster-minio
    restart: unless-stopped
    volumes:
      - ./minio_data:/data
    environment:
      MINIO_ACCESS_KEY: minioaccesskey
      MINIO_SECRET_KEY: miniosecretkey
    command: server /data
    ports:
      - "9000:9000"

Unfortunately, it doesn't work:

image

The docker logs don't contain any errors other than:

paaster-backend  | 172.24.0.6:55612 - "POST /controller/paste/ZAyupYPFU1cczVuD43w73GnO0HvYVZhW HTTP/1.1" 500

AFAIK, in minio, the region_name is ignored, and bucker/folder are created on demand.

Is there a way to increase log verbosity in paaster-backend?

shalak commented 1 year ago

Side-note: I had to downgrade the mongodb to 4.4, because my CPU doesn't have AVX support.

WardPearce commented 1 year ago

I see. It makes sense, would be nice to have this described in README.md. I've taken a liberty of creating a PR.

Did you have any success with deploying Paaster with some self-hosted s3 provider? I'm trying to make the following work:

version: '3'
services:
  paaster-backend:
    container_name: paaster-backend
    image: wardpearce/paaster-backend:latest
    restart: unless-stopped
    ports:
      - "8888:80"
    environment:
      paaster_max_paste_size: 1049000
      paaster_max_iv_size: 42
      paaster_open_api: '{"title": "paaster.io", "version": "2.0.0"}'
      paaster_mongo: '{"host": "paaster-mongodb", "port": 27017, "collection": "paasterv2"}'
      paaster_s3: '{"endpoint_url": "http://paaster-minio:9000","secret_access_key": "miniosecretkey","access_key_id": "minioaccesskey","region_name": "us-east-1","bucket": "paaster","folder": "pastes","download_url": "http://paaster-minio:9000/paaster"}'
      paaster_proxy_urls: '{"frontend": "https://paaster.mydomain.net", "backend": "http://paaster-backend.mydomain.net"}'
    depends_on:
      - paaster-mongodb
      - paaster-minio
    labels:
      - "traefik.enable=true"

  paaster:
    container_name: paaster
    image: wardpearce/paaster-frontend:latest
    restart: unless-stopped
    environment:
      VITE_NAME: "paaster.mydomain.net"
      VITE_API_URL: "https://paaster-backend.mydomain.net"
    ports:
      - "8889:80"
    labels:
      - "traefik.enable=true"

  paaster-mongodb:
    image: mongo:4.4
    container_name: paaster-mongodb
    restart: unless-stopped
    environment:
      MONGODB_DATA_DIR: /data/db
      MONDODB_LOG_DIR: /dev/null
    volumes:
      - ./paaster_data:/data/db

  paaster-minio:
    image: minio/minio
    container_name: paaster-minio
    restart: unless-stopped
    volumes:
      - ./minio_data:/data
    environment:
      MINIO_ACCESS_KEY: minioaccesskey
      MINIO_SECRET_KEY: miniosecretkey
    command: server /data
    ports:
      - "9000:9000"

Unfortunately, it doesn't work:

image

The docker logs don't contain any errors other than:

paaster-backend  | 172.24.0.6:55612 - "POST /controller/paste/ZAyupYPFU1cczVuD43w73GnO0HvYVZhW HTTP/1.1" 500

AFAIK, in minio, the region_name is ignored, and bucker/folder are created on demand.

Is there a way to increase log verbosity in paaster-backend?

minio looks great, If you want to open a PR including instructions for optionally using minio for self hosted s3 would be appreciated.

Regrading the error, if you look at the API response, it should include what errored. But interesting it seems if the create api fails it doesn't catch it. So I'll issue a PR soon to fix that.

WardPearce commented 1 year ago

https://github.com/WardPearce/paaster/pull/193

Should catch why minio isn't working & provide info. Most likely because you are missing required aioboto3 parameters.

shalak commented 1 year ago

minio looks great, If you want to open a PR including instructions for optionally using minio for self hosted s3 would be appreciated.

Sure, I will, once I got it working :)

Regrading the error, if you look at the API response, it should include what errored

The response is:

NoSuchBucket('An error occurred (NoSuchBucket) when calling the CreateMultipartUpload operation: The specified bucket does not exist')

So I fine-tuned the compose file, so a fresh minio container starts with a bucket. Furtunately, it's just a matter of making sure that the data dir contains a directory with the same name as a bucket:

  paaster-minio:
    image: minio/minio
    container_name: paaster-minio
    restart: unless-stopped
    volumes:
      - ./minio_data:/data
    environment:
      MINIO_ACCESS_KEY: minioaccesskey
      MINIO_SECRET_KEY: miniosecretkey
    entrypoint: sh
    command: -c 'mkdir -p /data/paaster && minio server /data'   # creates a dir (i.e. bucket), if not exists
    ports:
      - "9000:9000"

Now creating work OK, but there's a problem when trying to read the paste:

image

It appears, that the frontend itself is reaching out to S3 - is this an expected behavior?

WardPearce commented 1 year ago

Yes paaster_s3.download_url should be the public URL for your CDN. Instead of routing downloads via the backend, the frontend will directly call the CDN.

This is the http error your facing, appears you are mixing https & http.

WardPearce commented 1 year ago

If you look at the publicly hosted paaster api for example

https://api.paaster.io/controller/paste/lnDZnpG4HfWs3RmaRa4xi

you'll see a paste contains a download_url, whats used to download the paste via the cdn from the frontend.

shalak commented 1 year ago

Yes paaster_s3.download_url should be the public URL for your CDN

Ok, so basically, to self-host complete self-contained solution, we need to reverse proxy all three: frontend, backend and the S3 service. And since everything is e2e-encrypted, there's no security issue and we don't have to waste resources by routing the S3 through backend. Makes sense.

I can prepare an example compose that does this.

WardPearce commented 1 year ago

awesome, and if you can provide optionally steps in https://github.com/WardPearce/paaster#production-with-docker would be good too.

there's no security issue and we don't have to waste resources by routing the S3 through backend. Makes sense.

Yea plus the pasteId is different then the file name the encrypted paste is stored under, so if a user has set a access code. Anyone would still have to have the access code in-order to know the download URL.

shalak commented 1 year ago

Huh, I just realized - using the same routing leaves an unnecessary security hole - the backend service, to reach the S3, will have to go outside of its internal docker network, not ideal 🤔

Solving this would require a boilerplate configuration, with twoendpoint_url settings - endpoint_url_internal (for backend-minio communication) and endpoint_url_external (for frontend-minio communication). Makes no sense :/

I guess backend will have to use outside routing 🤷

WardPearce commented 1 year ago

With minio you should be able to only reverse proxy the route for downloading objects possibly. But most likely the easiest solution is just securing your locally hosted s3 with a strong secret access key.

shalak commented 1 year ago

With minio you should be able to only reverse proxy the route for downloading objects possibly. But most likely the easiest solution is just securing your locally hosted s3 with a strong secret access key.

Yah, but in that case, the backend wouldn't be able to put data to S3. The reverse-proxy would need to be configured to route everything if it comes from the backend service itself, and only the download route for public access. Adds even more maintenance :)