Closed shalak closed 1 year ago
I'm open to conversations & changes around Paaster's architecture.
S3 is used for object storage, so large blocks of data (in this case encrypted pastes) can be stored and delivered effectively via a CDN. Benefits of using S3 includes data chunking, content delivery & redundancy.
Mongo DB is basically used to provide metadata of an encrypted paste. This includes paste expiration, access code storage, IV storage & owner secrets.
Mongo DB isn't designed for large data storage and would be unpractical too, Mongo DB storage isn't cheap, limited to 16 MB per document and would require loading an entire encrypted paste into memory just to access other metadata around the paste.
For example if someone uploaded a 500MBs paste, this would have to be read entirely into memory and any subsequent queries to that document would load 500MBs into memory.
I see. It makes sense, would be nice to have this described in README.md. I've taken a liberty of creating a PR.
Did you have any success with deploying Paaster with some self-hosted s3 provider? I'm trying to make the following work:
version: '3'
services:
paaster-backend:
container_name: paaster-backend
image: wardpearce/paaster-backend:latest
restart: unless-stopped
ports:
- "8888:80"
environment:
paaster_max_paste_size: 1049000
paaster_max_iv_size: 42
paaster_open_api: '{"title": "paaster.io", "version": "2.0.0"}'
paaster_mongo: '{"host": "paaster-mongodb", "port": 27017, "collection": "paasterv2"}'
paaster_s3: '{"endpoint_url": "http://paaster-minio:9000","secret_access_key": "miniosecretkey","access_key_id": "minioaccesskey","region_name": "us-east-1","bucket": "paaster","folder": "pastes","download_url": "http://paaster-minio:9000/paaster"}'
paaster_proxy_urls: '{"frontend": "https://paaster.mydomain.net", "backend": "http://paaster-backend.mydomain.net"}'
depends_on:
- paaster-mongodb
- paaster-minio
labels:
- "traefik.enable=true"
paaster:
container_name: paaster
image: wardpearce/paaster-frontend:latest
restart: unless-stopped
environment:
VITE_NAME: "paaster.mydomain.net"
VITE_API_URL: "https://paaster-backend.mydomain.net"
ports:
- "8889:80"
labels:
- "traefik.enable=true"
paaster-mongodb:
image: mongo:4.4
container_name: paaster-mongodb
restart: unless-stopped
environment:
MONGODB_DATA_DIR: /data/db
MONDODB_LOG_DIR: /dev/null
volumes:
- ./paaster_data:/data/db
paaster-minio:
image: minio/minio
container_name: paaster-minio
restart: unless-stopped
volumes:
- ./minio_data:/data
environment:
MINIO_ACCESS_KEY: minioaccesskey
MINIO_SECRET_KEY: miniosecretkey
command: server /data
ports:
- "9000:9000"
Unfortunately, it doesn't work:
The docker logs don't contain any errors other than:
paaster-backend | 172.24.0.6:55612 - "POST /controller/paste/ZAyupYPFU1cczVuD43w73GnO0HvYVZhW HTTP/1.1" 500
AFAIK, in minio, the region_name
is ignored, and bucker
/folder
are created on demand.
Is there a way to increase log verbosity in paaster-backend
?
Side-note: I had to downgrade the mongodb to 4.4, because my CPU doesn't have AVX support.
I see. It makes sense, would be nice to have this described in README.md. I've taken a liberty of creating a PR.
Did you have any success with deploying Paaster with some self-hosted s3 provider? I'm trying to make the following work:
version: '3' services: paaster-backend: container_name: paaster-backend image: wardpearce/paaster-backend:latest restart: unless-stopped ports: - "8888:80" environment: paaster_max_paste_size: 1049000 paaster_max_iv_size: 42 paaster_open_api: '{"title": "paaster.io", "version": "2.0.0"}' paaster_mongo: '{"host": "paaster-mongodb", "port": 27017, "collection": "paasterv2"}' paaster_s3: '{"endpoint_url": "http://paaster-minio:9000","secret_access_key": "miniosecretkey","access_key_id": "minioaccesskey","region_name": "us-east-1","bucket": "paaster","folder": "pastes","download_url": "http://paaster-minio:9000/paaster"}' paaster_proxy_urls: '{"frontend": "https://paaster.mydomain.net", "backend": "http://paaster-backend.mydomain.net"}' depends_on: - paaster-mongodb - paaster-minio labels: - "traefik.enable=true" paaster: container_name: paaster image: wardpearce/paaster-frontend:latest restart: unless-stopped environment: VITE_NAME: "paaster.mydomain.net" VITE_API_URL: "https://paaster-backend.mydomain.net" ports: - "8889:80" labels: - "traefik.enable=true" paaster-mongodb: image: mongo:4.4 container_name: paaster-mongodb restart: unless-stopped environment: MONGODB_DATA_DIR: /data/db MONDODB_LOG_DIR: /dev/null volumes: - ./paaster_data:/data/db paaster-minio: image: minio/minio container_name: paaster-minio restart: unless-stopped volumes: - ./minio_data:/data environment: MINIO_ACCESS_KEY: minioaccesskey MINIO_SECRET_KEY: miniosecretkey command: server /data ports: - "9000:9000"
Unfortunately, it doesn't work:
The docker logs don't contain any errors other than:
paaster-backend | 172.24.0.6:55612 - "POST /controller/paste/ZAyupYPFU1cczVuD43w73GnO0HvYVZhW HTTP/1.1" 500
AFAIK, in minio, the
region_name
is ignored, andbucker
/folder
are created on demand.Is there a way to increase log verbosity in
paaster-backend
?
minio looks great, If you want to open a PR including instructions for optionally using minio for self hosted s3 would be appreciated.
Regrading the error, if you look at the API response, it should include what errored. But interesting it seems if the create api fails it doesn't catch it. So I'll issue a PR soon to fix that.
https://github.com/WardPearce/paaster/pull/193
Should catch why minio isn't working & provide info. Most likely because you are missing required aioboto3 parameters.
minio looks great, If you want to open a PR including instructions for optionally using minio for self hosted s3 would be appreciated.
Sure, I will, once I got it working :)
Regrading the error, if you look at the API response, it should include what errored
The response is:
NoSuchBucket('An error occurred (NoSuchBucket) when calling the CreateMultipartUpload operation: The specified bucket does not exist')
So I fine-tuned the compose file, so a fresh minio
container starts with a bucket. Furtunately, it's just a matter of making sure that the data dir contains a directory with the same name as a bucket:
paaster-minio:
image: minio/minio
container_name: paaster-minio
restart: unless-stopped
volumes:
- ./minio_data:/data
environment:
MINIO_ACCESS_KEY: minioaccesskey
MINIO_SECRET_KEY: miniosecretkey
entrypoint: sh
command: -c 'mkdir -p /data/paaster && minio server /data' # creates a dir (i.e. bucket), if not exists
ports:
- "9000:9000"
Now creating work OK, but there's a problem when trying to read the paste:
It appears, that the frontend itself is reaching out to S3 - is this an expected behavior?
Yes paaster_s3.download_url
should be the public URL for your CDN. Instead of routing downloads via the backend, the frontend will directly call the CDN.
This is the http error your facing, appears you are mixing https & http.
If you look at the publicly hosted paaster api for example
https://api.paaster.io/controller/paste/lnDZnpG4HfWs3RmaRa4xi
you'll see a paste contains a download_url, whats used to download the paste via the cdn from the frontend.
Yes
paaster_s3.download_url
should be the public URL for your CDN
Ok, so basically, to self-host complete self-contained solution, we need to reverse proxy all three: frontend, backend and the S3 service. And since everything is e2e-encrypted, there's no security issue and we don't have to waste resources by routing the S3 through backend. Makes sense.
I can prepare an example compose that does this.
awesome, and if you can provide optionally steps in https://github.com/WardPearce/paaster#production-with-docker would be good too.
there's no security issue and we don't have to waste resources by routing the S3 through backend. Makes sense.
Yea plus the pasteId is different then the file name the encrypted paste is stored under, so if a user has set a access code. Anyone would still have to have the access code in-order to know the download URL.
Huh, I just realized - using the same routing leaves an unnecessary security hole - the backend service, to reach the S3, will have to go outside of its internal docker network, not ideal 🤔
Solving this would require a boilerplate configuration, with twoendpoint_url
settings - endpoint_url_internal
(for backend-minio communication) and endpoint_url_external
(for frontend-minio communication). Makes no sense :/
I guess backend will have to use outside routing 🤷
With minio you should be able to only reverse proxy the route for downloading objects possibly. But most likely the easiest solution is just securing your locally hosted s3 with a strong secret access key.
With minio you should be able to only reverse proxy the route for downloading objects possibly. But most likely the easiest solution is just securing your locally hosted s3 with a strong secret access key.
Yah, but in that case, the backend wouldn't be able to put data to S3. The reverse-proxy would need to be configured to route everything if it comes from the backend service itself, and only the download route for public access. Adds even more maintenance :)
Could someone elaborate on the role and purpose of S3 storage in this system?
We also have 'paaster_mongo' in use. Is it necessary to have these two types of data storage to run Paaster? What are they used for?