DataVault / datavault

DataVault Project
MIT License
19 stars 16 forks source link

Rewrite LocalFileSystem storage plugin WAS Replace FileCopy with new class using Java nio2 libraries (RSS022-23) #372

Open ghost opened 7 years ago

ghost commented 7 years ago

The current LocalFileSystem plugin doesn't preserve Unix links (symbolic and hard). It uses a modified version of Apache FileUtils called FileCopy which should be replaced with the Java nio2 libraries...

https://docs.oracle.com/javase/tutorial/essential/io/copy.html.

Note that the reason there is a bespoke version of FileUtils is that it was changed to incorporate progress monitoring in the form of the Progress object. This will also need to be included in the new replacement class.

WilliamPetit commented 7 years ago

I was about to send an email but that's probably better to put it here.

I've been looking at the DataVault code and I'm confused: The org.datavaultplatform.common.io.FileCopy constructor calls super() but doesn't extends anything or am I missing something? What is this FileCopy, is it a copy/past of the Apache Commons IO version? It looks like what it does for the copy is use: java.nio.channels.FileChannel.transferFrom(...) so if I understand, it's just that part that need to be change to use Java nio2 instead?

Also I manage to get the app running on the vagrant but is there a way to login for testing?

tomhigginsuom commented 7 years ago

Hi William,

FileCopy is a modified version of the Apache Commons IO code to allow the progress of the file copies (e.g. bytes, files and directories) to be reported. Here's the original commit that added the code:

https://github.com/DataVault/datavault/commit/4c7a342811f873030937f3150d68bb01cf427aa4

Essentially there's a "Progress" object that is used to periodically send events back to the broker via the message queue about how the file copy is progressing.

The actual mechanism of copying the files is unchanged from the Apache implementation (so that's what I think you'll need to look at).

There are some default usernames in the database for testing e.g. "user1" / "password1" which you can try. For a real implementation the user authentication is expected to come from web server pre-authentication (e.g. Shibboleth or CAS).

WilliamPetit commented 7 years ago

I have made the changes to use java.nio.file.Files.copy(Path source, Path target, CopyOption... options) but it still copy a file instead of a link.

I think the reason is that the gov.loc.repository.bagit.impl.PreBagImpl.makeBagInPlace(...) is also using the org.apache.commons.io.FileUtils to copy the files ( see source)

So, the bagit object will also have to be updated to use the nio2 function.

WilliamPetit commented 7 years ago

I've created a new class to replace the PreBagImpl i.e. DatavaultPreBagImpl and over write the method using the old Files copy but the links were still not copied properly. It looks like I wasn't looking at the right place, so I've now tried to tracked the issue and it seems to start when LocalFileSystem is using getCanonicalPath() which convert the path to the file (i.e link) to a path to the linked file.

WilliamPetit commented 7 years ago

I'm fixing it step by step and just found out that org.datavaultplatform.worker.operations.Tar.addFileToTar() also doesn't preserve the link...

seesmith commented 6 years ago

In JIRA as https://www.jira.is.ed.ac.uk/browse/RSS022-23