Closed cript0nauta closed 5 years ago
The choice of uuid1 is actually on purpose as due to the involvement of the host unique address, timestamp and sequence it ensures uniqueness across distributed systems. UUID4 has a very low but possible chance for collisions as it's totally randomic.
Also uuid4 relies on hardware based entropy, which means that in some very rare conditions it might block until new entropy is available.
I think that allowing an option to provide a custom nodeid would avoid the leakage of the MAC address, even though it would still allow to guess the sequence of IDs, which I don't feel it's an issue given that depot is supposed to serve public files.
I understand. Maybe I could implement it on my codebase making a class that inherits from LocalFileStorage and overrides the create method, that is where uuid1 is called.
Aside of this, we are thinking in using this library to serve (very) private files instead of public ones. Is this feature in the project roadmap? If not, we understand that we maybe will have to modify some internal parts of depot. Do you think this could work for us, or maybe using another library or implementing it from scratch would be better?
Aside of this, we are thinking in using this library to serve (very) private files instead of public ones.
Just avoid using the DepotMiddleware
and serve the files yourself.
If your concern is preventing people from trying to guess the ids, I think you can easily apply HMAC with a secret to the IDs before sending out the URLs that include those, as the URLs will be generated by the same backend that has to read them (or at least one that can share the same secret). At that point any attempt at trying with a different ID will be invalid and you can decode back the id from the HMAC before looking up the files.
Maybe I could implement it on my codebase making a class that inherits from LocalFileStorage and overrides the create method, that is where uuid1 is called.
Going through the previous suggestion would make this unrequred as any id used by depot will be masqueraded by the hashing you apply to it, but If you want to submit a pull request to allow LocalFileStorage
, MemoryFileStorage
, awss3.S3Storage
and boto3.S3Storage
to accept a id_generator
argument which by default is uuid1
(to retain compatibilty) I'll gladly merge it, so you can use uuid4 or any other kind of unique id
Hi! Looking at the source code of the package I found that the file IDs are generated using
uuid.uuid1()
function. According to the uuid library docs (https://docs.python.org/3/library/uuid.html):It says it use the MAC address of the host to generate it. This is an unwanted behavior since this information shouldn't be public. The best solution will be using
uuid4
instead.Another advantage of using
uuid4
is that the IDs will be less predictable since they are randomly generated. Usinguuid1
an attacker could easilly guess other file IDs since a few bytes change between two different file IDs.