eXist-db / exist

eXist Native XML Database and Application Platform
https://exist-db.org
GNU Lesser General Public License v2.1
427 stars 179 forks source link

copyCollection creates orphaned documents when target collection name is the same as source #99

Open caseydawsonjordan opened 10 years ago

caseydawsonjordan commented 10 years ago

To reproduce:

1.) Create a collection with documents in it. 2.) Copy that collection to a seperate location 3.) Copy it back over the original 4.) Take a backup and see all the original files are now in the lost_and_found

IHMO the correct solution to this is to throw an exception when the target collection exists to prevent data loss. The same should be applied to moveCollection, which will simply delete the target if it exists. Without an exception being thrown this creates a situation for data loss to occur very easily without warning.

dizzzz commented 10 years ago

It is all about WebDAV? That is: by calling the WebDAV methods? Or is it in General?

shabanovd commented 10 years ago

This behavior of eXist's core. I think that exception is normal, or better to say, expecting reaction.

@adamretter @wolfgangmm , what do you think?

dizzzz commented 10 years ago

I have been thinking a bit more on this: For me, this kind of protection should be on application level, not on the 'core'. I do not want this protection out of the box, my move command....... should just be moving data.

caseydawsonjordan commented 10 years ago

Dannes I have a very hard time understanding why that would be preferable.

1.) This isn't how any other system operates, especially unix. eXist is suppose to mimic unix as closely as possible...no? 2.) By leaving this logic at the application level this creates a very real possibility that developers unintentionally create a way for end users to very easily lose data. (This is already a fact with several eXist extensions). 3.) Why not have this protection built in, and then have the application layer override it if they want?...This would give the same exact amount of control, but would eliminate the possibility of unintentional data loss....

On Mon, Dec 16, 2013 at 2:45 PM, Dannes Wessels notifications@github.comwrote:

I have been thinking a bit more on this: For me, this kind of protection should be on application level, not on the 'core'. I do not want this protection out of the box, my move command....... should just be moving data.

— Reply to this email directly or view it on GitHubhttps://github.com/eXist-db/exist/issues/99#issuecomment-30693203 .

Casey Jordan easyDITA a product of Jorsek LLC "CaseyDJordan" on LinkedIn, Twitter & Facebook (585) 348 7399 easydita.com

This message is intended only for the use of the Addressee(s) and may contain information that is privileged, confidential, and/or exempt from disclosure under applicable law. If you are not the intended recipient, please be advised that any disclosure copying, distribution, or use of the information contained herein is prohibited. If you have received this communication in error, please destroy all copies of the message, whether in electronic or hard copy format, as well as attachments, and immediately contact the sender by replying to this e-mail or by phone. Thank you.

adamretter commented 10 years ago

Casey I agree with you. This has to be recognised as a bug in the core, of course we should never be intentionally orphaning files!

The only question is whether to either:

a) Throw an Exception at (3) if the collection already exists b) Just overwrite the collection at (3) if the collection already exists c) Something else...

My preference here would be to avoid exceptions and do (c) i.e. just one-way merge the source to the destination during the copy, this is exactly what happens on a UNIX system. e.g. given the filesystem:

/1/A/hello.txt /2/A/goodbye.txt

and given the command: "cp -r /1/A /2"

Then, /2/A ends up with both hello.txt and goodbye.txt files, any files with the same name that existed in both /1/A and /2/A would have been overwritten in /2/A by the copy.

I meant to add that, if you need protection it is up to you to check whether the collection/resource exists before doing the copy/move, just like in UNIX.

caseydawsonjordan commented 10 years ago

Adam,

I see the appeal to this approach. It's certainly better than the alternative. Still makes me nervous though ;) I'd prefer to have a flag which needed to be set (or two sets of methods) so that it was very clear to the developer what the function was doing at a quick glance. It would be verbose, and a bit ugly but super safe.

It's somewhat like the age old practice of creating a new alias for 'mv' and 'cp' that does some extra checking to prevent accidentally issuing a command you would regret when working too late some night...

On Mon, Dec 16, 2013 at 3:11 PM, Adam Retter notifications@github.comwrote:

Casey I agree with you. This has to be recognised as a bug in the core, of course we should never be intentionally orphaning files!

The only question is whether to either:

a) Throw an Exception at (3) if the collection already exists b) Just overwrite the collection at (3) if the collection already exists

My preference here would be to avoid exceptions and just one-way merge the source to the destination during the copy, this is exactly what happens on a UNIX system. e.g. given the filesystem:

/1/A/hello.txt /2/A/goodbye.txt

and given the command: "cp -r /1/A /2"

Then, /2/A ends up with both hello.txt and goodbye.txt files, any files with the same name that existed in both /1/A and /2/A would have been overwritten in /2/A by the copy.

— Reply to this email directly or view it on GitHubhttps://github.com/eXist-db/exist/issues/99#issuecomment-30695738 .

Casey Jordan easyDITA a product of Jorsek LLC "CaseyDJordan" on LinkedIn, Twitter & Facebook (585) 348 7399 easydita.com

This message is intended only for the use of the Addressee(s) and may contain information that is privileged, confidential, and/or exempt from disclosure under applicable law. If you are not the intended recipient, please be advised that any disclosure copying, distribution, or use of the information contained herein is prohibited. If you have received this communication in error, please destroy all copies of the message, whether in electronic or hard copy format, as well as attachments, and immediately contact the sender by replying to this e-mail or by phone. Thank you.