amahi / amahi-anywhere-fs

Amahi Anywhere file server
GNU General Public License v3.0
12 stars 10 forks source link

Fix the file name already exists on the server when uploading the file #18

Closed phoon closed 5 years ago

cpg commented 5 years ago

This issue reminded me of this

In any case, I am not sure I like the idea of a random string. Especially at the beginning of the file name. This can have the effect of the user not seeing the files listed together (typically the default is often list files alphabetically, so they would be potentially far apart).

I feel that that a little forward thinking can be useful here.

What other options do we have?

0) random string at the beginning (what you proposed) 1) random string at the end (preserving the extension), i.e. XYZ.foo XYZ-hj8d3y.foo 2) like the above image .. have a (1) and (2) and (3), etc. when the file exists: XYZ (1).foo and XYZ (2).foo, etc. (the separator being a space). This involves one more loop to check if other file names exist and incrementing the counter (though even random ought to check if the new filename exists, remote as it may be). 3) a date/timestamp, e.g. if XYZ.foo exists -- XYZ-20190304-1420.foo 4) do a little what Dropbox does -- e.g. XYZ (conflicted copy 2019-03-04).foo

Option 2 has the advantage of being widely known by many of the average users at large, since Windows uses this.

I kind of like 3 and 4.

Note: for any extensions that are added at the end, the extension must be preserved, for (I hope) obvious reasons

Any thoughts, guys?

phoon commented 5 years ago

the new commit a72f804 implements use a timestamp suffix to rename the file, i think this is a good way to solve the problem, user can use the timestamp to determine the update of the file.

csoni111 commented 5 years ago

About matching md5 if the file exists (as @iPeven mentioned in #17), I think we should include that too in this pr. It will ensure user doesn't accidentally upload a file twice.

phoon commented 5 years ago

Actually, SHA is more secure than MD5 on hash collisions. What bothers me a lot is it will takes a long time to calculate the hash if the file is too large. And I have no idea about where to store it at now.

csoni111 commented 5 years ago

I believe MD5 is good enough to check file duplicacy. Most of the file duplicate checkers use MD5. This answer on SO gives a rough probability of finding same hash for two different files which is very very low. As for storing it, we can calculate it at run time, no need to store it as it will happen for only those uploads whose file name and size matches first with already existing files.

phoon commented 5 years ago

Done!The processing flow is as follows:

 upload a file ---(not exists)---> create and write directly
             |
              ---(exists)---> calculate the MD5 ---(MD5 is different)---> Rename the uploaded one, create&write
                                               |
                                                ---(Same MD5)---> ignore the uploaded file
phoon commented 5 years ago

@csoni111 Thank you for your code reviewing, I apologize for my lack of project experience.