amol- / depot

Toolkit for storing files and attachments in web applications
MIT License
161 stars 41 forks source link

Use uuid4 instead of uuid1 #45

Closed cript0nauta closed 5 years ago

cript0nauta commented 7 years ago

Hi! Looking at the source code of the package I found that the file IDs are generated using uuid.uuid1() function. According to the uuid library docs (https://docs.python.org/3/library/uuid.html):

If all you want is a unique ID, you should probably call uuid1() or uuid4(). Note that uuid1() may compromise privacy since it creates a UUID containing the computer’s network address. uuid4() creates a random UUID.

It says it use the MAC address of the host to generate it. This is an unwanted behavior since this information shouldn't be public. The best solution will be using uuid4 instead.

Another advantage of using uuid4 is that the IDs will be less predictable since they are randomly generated. Using uuid1 an attacker could easilly guess other file IDs since a few bytes change between two different file IDs.

amol- commented 7 years ago

The choice of uuid1 is actually on purpose as due to the involvement of the host unique address, timestamp and sequence it ensures uniqueness across distributed systems. UUID4 has a very low but possible chance for collisions as it's totally randomic.

Also uuid4 relies on hardware based entropy, which means that in some very rare conditions it might block until new entropy is available.

I think that allowing an option to provide a custom nodeid would avoid the leakage of the MAC address, even though it would still allow to guess the sequence of IDs, which I don't feel it's an issue given that depot is supposed to serve public files.

cript0nauta commented 7 years ago

I understand. Maybe I could implement it on my codebase making a class that inherits from LocalFileStorage and overrides the create method, that is where uuid1 is called.

Aside of this, we are thinking in using this library to serve (very) private files instead of public ones. Is this feature in the project roadmap? If not, we understand that we maybe will have to modify some internal parts of depot. Do you think this could work for us, or maybe using another library or implementing it from scratch would be better?

amol- commented 7 years ago

Aside of this, we are thinking in using this library to serve (very) private files instead of public ones.

Just avoid using the DepotMiddleware and serve the files yourself.

If your concern is preventing people from trying to guess the ids, I think you can easily apply HMAC with a secret to the IDs before sending out the URLs that include those, as the URLs will be generated by the same backend that has to read them (or at least one that can share the same secret). At that point any attempt at trying with a different ID will be invalid and you can decode back the id from the HMAC before looking up the files.

Maybe I could implement it on my codebase making a class that inherits from LocalFileStorage and overrides the create method, that is where uuid1 is called.

Going through the previous suggestion would make this unrequred as any id used by depot will be masqueraded by the hashing you apply to it, but If you want to submit a pull request to allow LocalFileStorage, MemoryFileStorage, awss3.S3Storage and boto3.S3Storage to accept a id_generator argument which by default is uuid1 (to retain compatibilty) I'll gladly merge it, so you can use uuid4 or any other kind of unique id