HaschekSolutions / pictshare

:camera: PictShare is an open source image, mp4, pastebin hosting service with a simple resizing and upload API that you can host yourself. :rice_scene:
https://www.pictshare.net
Apache License 2.0
819 stars 123 forks source link

Scaled instances and the deletion problem #80

Open geek-at opened 5 years ago

geek-at commented 5 years ago

Now the codebase has been rewritten we can start and think about the problem with scaling pictshare: deleting content.

Imagine two Pictshare servers connected through a shared folder (ALT_FOLDER)

An image is requested frequently so both servers have a local copy and there is a copy in the shared folder.

If the user wants to delete the image, it's deleted off the server the user sent the request to and from the shared folder.

the second server never got any info about any deleted hash so they kept theirs.

Possible solutions:

thomasjsn commented 5 years ago

I'm really loving that this app doesn't require a database, a centralized database will introduce some complexity. A list of deleted hashes in all storage controllers and a cron job is quite simple and would do the job. I'm guessing instant deletion is not really required.

cwilby commented 4 years ago

Just some thoughts:

Each server maintains a list of peers.

The first server is created (0), then the second server is created (1) and pointed to 0. 0 and 1 both update their lists to [0,1].

For each server added after 1, the server being added is pointed to any server (N). N iterates through every server in its list but itself (If N is 1, this subset is [0]) and sends an HTTP message telling it to add the new server N to their list. Making the list [0, 1, N] on each server.

With this in place, when a server receives a delete request, it performs the delete, then sends a delete signal via HTTP to each server on its list (which should be up to date given the above works).


TL;DR - I agree with making nodes communicate.

geek-at commented 4 years ago

The problem with all nodes talking to each other is that it would complicate the whole project by a landslide.

I think the easiest way to implement it would be to have a list of deleted hashes that won't get re-used by chance and this list should be copied and checked by all storage providers

cwilby commented 4 years ago

Sounds good, where would the deleted hashes be stored? If each node has a copy it would be similarly complex

geek-at commented 4 years ago

The easiest implementation would be a simple file where delelted hashes are stored

This file should then be compared with the list on every storage controller and every pictshare instance should periodically check this file for hashes to delete. and check storage controllers for updated hashes to add to their local list.

It's just a simple blacklist system. I think that could work.

cwilby commented 4 years ago

Yep that sounds like it could work. Each node can be configured to communicate with a service to add/read deleted hashes. Would the service be the root pictshare instance or something else?

geek-at commented 4 years ago

I'm thinking cronjob so admins can set their own intervals for comparing the blacklist and deletions can take as much time as they need