DistrictDataLabs / cultivar

Multidimensional data explorer and visualization tool.
http://trinket.districtdatalabs.com
Apache License 2.0
52 stars 18 forks source link

Dataset Overwrite/Versioning System #7

Open bbengfort opened 8 years ago

bbengfort commented 8 years ago

Right now if you upload a duplicate file, the file is modified on S3 - e.g. its "last modified" timestamp changes. We need to ask some important questions for data management:

  1. Are we simply "touching" the file or are we rewriting it?
  2. What counts as a duplicate on S3? Presumably just the filename, or are we protected by the hash?
  3. Can we use some temporary data store in S3 that gets cleaned regularly for protection?
  4. Should we save datasets according to their hash, then rename on download?

We should make sure that a dataset cannot be overridden if someone uploads a different dataset with the same name.

bbengfort commented 8 years ago

At this point we now have a new "versioning" scheme, wherein every dataset has its own unique version that you can download and go back in time to see. This is definitely a more advanced usage, and related to this issue; but further thought is going to be required. As such, I'm moving this issue back into the backlog.

rebeccabilbro commented 7 years ago

This will be resolved by #59