GooBox / goobox-sync-storj

Sync app for Storj
GNU General Public License v3.0
3 stars 1 forks source link

Use hmac value to compare cloud and local copy of files #29

Open kaloyan-raev opened 6 years ago

kaloyan-raev commented 6 years ago

The bridge stores an hmac value for each file. It is returned in the JSON response for each file when listing the bucket's file, e.g.

"hmac": {
  "value": "3c3cd36d8046484141c41c49d9395b7a6bc4be6ddf6fb2936b0bc3524f026182d6f18a5233203cca2cbe07a4422852b3558afcbd240939a98dde6b9750516e99",
  "type": "sha512"
}

We need to check if we can take advantage of this value. It might be useful in cases when a file is present both on the cloud and in the local storage, but not in the sync DB. In such case, the sync app will detect a conflict. If it can compare the hmac of the local and cloud copy then it can detect that the copies are equal and mark the file as synced without reporting a conflict.

A valid user scenario would be if the user wants to recreate the sync DB due to some bug in the app. Then the user would just delete the sync DB and run again the sync app. The app would recreate the sync DB without transferring any files or reporting conflicts just by comparing the hmac values.

kaloyan-raev commented 6 years ago

I asked about the hmac value in the Storj community chat: https://community.storj.io/channel/dev?msg=56nQn8TXDRtLQk6qn

kaloyan-raev commented 6 years ago

This would work if libstorj provides a function that can calculated the HMAC for the local file using the same logic it uses when uploading files.

I opened an issue in the libstorj project: https://github.com/Storj/libstorj/issues/401

kaloyan-raev commented 6 years ago

I pushed a related PR to libstorj: https://github.com/Storj/libstorj/pull/402

This is not the PR that provides the API function, requested in https://github.com/Storj/libstorj/pull/401, but one for properly populating the HMAC available in the bridge to the file metadata returned to the client.

jkawamoto commented 6 years ago

If libstorj decide not providing functions to compute only HMACs, we can compute HMACs by ourselves with some libraries such as The Ripple Java Library. However, we should consider that files have to be encrypted to compute HMACs in either case. Maybe, we should compare file sizes first, and then compare HMACs so that we can reduce the computational cost.

kaloyan-raev commented 6 years ago

The build-in crypto functions in Java are enough for computing HMAC. The problem is that the HMAC value stored in the bridge is not just a simple HMAC-512 checksum of the local file. It's a much more complex formula that looks like this: HMACSHA512([RIPMD160(SHA256(shard_1_data))|RIPMD160(SHA256(shard_2_data))|...])

So, if we want to do it in pure Java, we need to replicate the whole sharding process done by libstorj. This would be complex to implement and maintain. Hence, it's better to have it provided by libstorj, which has already implemented that logic.

Regarding the computational cost, I totally agree. Calculating the HMAC, especially for big files, is a very expensive operation. So, we should first check the file size and compare HMAC only if the file size of the two copies is equal.

jkawamoto commented 6 years ago

First of all, I agree we should keep asking libstorj to export a function computing HMAC. I meant, in the worst case scenario, we need to replicate the whole sharding process and the above library might help it. The process is complicated but exporting such functions from libstorj also seems not easy. So, we should have a plan B.