LibraryOfCongress / bagger

The Bagger application packages data files according to the BagIt specification.
Other
120 stars 19 forks source link

Bagger changed files in transit #44

Closed BenHoutman closed 7 years ago

BenHoutman commented 7 years ago

Hello all, ("Given When Then" below) I recently downloaded bagger 2.7.6 for the purposes of transferring a number of files from an external drive to my company's archives computer (OS X 10.11.3), and eventually to our server. I've used bagit.py in the command line with success at a different job, but for this one we did not want to disturb the original drive, so I thought I'd use bagger in order to bag the files in the destination folder rather than on the drive we'd mounted read-only. I transferred the files using bagger and validated the bag. It checked out as valid. Unfortunately, files that played fine on the original drive and on files from an earlier (and successful) rsync transfer did not play for the bagged files. Files sizes were different as well. It also would appear to have overwritten the date modified on the folders in the destination folders. If there's anything you know of that would to make it run properly do let me know. I'm hoping it's user error.

given

when

then how do I make this happen successfully? When I tried to do this Bagger failed to deliver full files. This is not a problem that occurs with rsync or even drag-and-drop. So something about the addition of Bagger has likely caused a problem

Best, Ben

johnscancella commented 7 years ago

Hi Ben,

There is already a open issue regarding the file date modification (https://github.com/LibraryOfCongress/bagger/issues/16).

Does this still happen if you copy the files using rsync and then bag in place using bagger? Since you have some experience using bagit.py, try rsyncing the files and using it to bag in place to rule out that the files are bad and it is actually bagger causing the problem.

BenHoutman commented 7 years ago

screen shot 2017-06-16 at 10 26 54 am screen shot 2017-06-16 at 10 26 01 am screen shot 2017-06-16 at 10 23 55 am screen shot 2017-06-16 at 10 23 09 am screen shot 2017-06-16 at 10 09 28 am screen shot 2017-06-16 at 10 08 43 am screen shot 2017-06-16 at 10 07 08 am

Thanks John,

After trying your suggestion of rsyncing then using bag.py, here is the scenario: I bagged in place the files from an rsync transfer using bagit.py and validated the bag, as you suggested. I tested some of the 'same' files from the original drive (mounted read-only), the previously rsynced folder I just bagged with bagit, and folder I bagged in the destination using bagger. These were the results: The .mov files from the original drive and the rsync folder played fine (after quicktime created converted proxies), but for the most part the bagged files would not play at all. Same thing happened with the .lfa files. Where the file sizes were different, the files from the bagger transfer understandably did not work. Where they were *the same, they did. I haven't done a systematic review of the files, but considerably more often than not, something, perhaps bagger, nicked off ~1-2% of each file. Bagger was the only thing knowingly added to the process.

Basically, rsync, then bagit.py worked fine, but using bagger to create a bag in destination was a disaster. I'm not ruling out that I might have made an error, but I'm reluctant to try it again. Do you have any pointers?

I attach screenshots of the following: -"get info" "general" metadata for each of the 3 main folders from the original drive ("Dance Forms App," "Merce DanceForms Files," and "mercecunningham") featuring each of the 3 versions, from top-to-bottom: original, rsync, bagger -"get info" "general" metadata for an .mov file in the same manner as above (original, rsync, bagger) -a screenshot of the results from the rsync transfer of the .mov file side-by-side with the failed bagger transfer -"get info" "general" metadata for an .lfa file in the same manner as above (original, rsync, bagger) -a screenshot of the original .lfa file (the rsync file looked the same) side-by-side with the failed bagger transfer

Best, Ben

On Fri, Jun 16, 2017 at 8:47 AM, John Scancella notifications@github.com wrote:

Hi Ben,

There is already a open issue regarding the file date modification (#16 https://github.com/LibraryOfCongress/bagger/issues/16).

Does this still happen if you copy the files using rsync and then bag in place using bagger? Since you have some experience using bagit.py, try rsyncing the files and using it to bag in place to rule out that the files are bad and it is actually bagger causing the problem.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/LibraryOfCongress/bagger/issues/44#issuecomment-309017304, or mute the thread https://github.com/notifications/unsubscribe-auth/AcE543vwt5eAyBK_BqWTCYxwhqoffXmRks5sEnlJgaJpZM4N6Wkh .

--

Ben Houtman, Audiovisual Archivist

Merce Cunningham Trust

130 West 56th Street, Suite 707

New York, NY 10019

(212) 255-8240 - office

(512) 767-0288 - cell

bhoutman@mercecunningham.org

www.mercecunningham.org

johnscancella commented 7 years ago

There are no attachments that I see. Are there any errors in the log files? You can find them in the logs directory where bagger is installed.

For example mine looks like this

bagger-2.7.6
├── bin
│   ├── bagger
│   ├── bagger.bat
│   ├── baggerLinter.py
│   └── discoverJavaVersions.py
├── doc
│   ├── BaggerQuickReference.doc
│   ├── BaggerUserGuide.doc
│   ├── LICENSE.txt
│   ├── NOTICE.txt
│   └── README.txt
├── keystore
│   └── bagger.ks
├── lib
│   ├── <various jars here>
└── logs
    ├── bagger.log
    └── debug.log
BenHoutman commented 7 years ago

Hi John, My apologies, I'd added the screenshots as a response in gmail. I just added them to the original post. There are two logs-bagger and debug-but neither of them have any info from the date of the transfer, June 2nd, they're all from the date of install, May 31. Is there somewhere else I'd find a log of the transfer? Here are screenshots of the logs I have and their complete contents.

screen shot 2017-06-16 at 11 21 45 am screen shot 2017-06-16 at 11 21 53 am screen shot 2017-06-16 at 11 22 08 am

johnscancella commented 7 years ago

Do you have multiple versions of bagger installed? Because the log should be overwritten each time you use bagger. Perhaps you are looking at a different version of bagger then what you actually used for the transfer?

BenHoutman commented 7 years ago

There's some user error! Yes, there is a version of Bagger that is still in "downloads" rather than in "applications," where I'd put it. It should have been the same generation, dragged over to applications, if that matters. Unfortunately, after I'd discovered the problems with the bag in question, I'd tested it on another file to see if it would work. That individual file transferred fine, but it looks like there was an error:

2017-06-09 11:18:50,876 ERROR [AWT-EventQueue-0] g.l.r.b.u.h.ClearBagHandler [ClearBagHandler.java:93] failed to create new bag with specified version, defaulting to using null

I don't know if that tells you anything unfortunately. Here's the whole bagger.log file from June 9: 2017-06-09 11:18:37,091 INFO [main] g.l.r.b.d.JSonBagger [JSonBagger.java:54] Using profiles from /Users/bhoutman/bagger 2017-06-09 11:18:41,907 INFO [AWT-EventQueue-0] g.l.r.b.b.i.DefaultBag [DefaultBag.java:103] gov.loc.repository.bagger.bag.impl.DefaultBag: DefaultBag.init file: null, version: 0.96 2017-06-09 11:18:41,923 INFO [AWT-EventQueue-0] g.l.r.b.u.BagView [BagView.java:176] createControl - User Home Path: /Users/bhoutman 2017-06-09 11:18:42,121 INFO [AWT-EventQueue-0] g.l.r.b.u.TagManifestPane [TagManifestPane.java:82] TagManifestPane.populateBagPane getTags: 2 2017-06-09 11:18:50,868 INFO [AWT-EventQueue-0] g.l.r.b.u.NewBagFrame [NewBagFrame.java:163] BagVersionFrame.OkNewBagHandler 2017-06-09 11:18:50,869 INFO [AWT-EventQueue-0] g.l.r.b.u.h.StartNewBagHandler [StartNewBagHandler.java:43] Creating a new bag with version: 0.97, profile: 2017-06-09 11:18:50,870 INFO [AWT-EventQueue-0] g.l.r.b.b.i.DefaultBag [DefaultBag.java:103] gov.loc.repository.bagger.bag.impl.DefaultBag: DefaultBag.init file: null, version: 2017-06-09 11:18:50,876 ERROR [AWT-EventQueue-0] g.l.r.b.u.h.ClearBagHandler [ClearBagHandler.java:93] failed to create new bag with specified version, defaulting to using null java.lang.IllegalArgumentException: null at gov.loc.repository.bagit.BagFactory$Version.valueOfString(BagFactory.java:74) ~[bagit-4.12.2.jar:na] at gov.loc.repository.bagger.bag.impl.DefaultBag.init(DefaultBag.java:118) ~[bagger-core-2.7.6.jar:na] at gov.loc.repository.bagger.bag.impl.DefaultBag.(DefaultBag.java:99) ~[bagger-core-2.7.6.jar:na] at gov.loc.repository.bagger.ui.handlers.ClearBagHandler.newDefaultBag(ClearBagHandler.java:90) [bagger-2.7.6.jar:na] at gov.loc.repository.bagger.ui.handlers.ClearBagHandler.clearExistingBag(ClearBagHandler.java:73) [bagger-2.7.6.jar:na] at gov.loc.repository.bagger.ui.handlers.StartNewBagHandler.createNewBag(StartNewBagHandler.java:45) [bagger-2.7.6.jar:na] at gov.loc.repository.bagger.ui.NewBagFrame$1.doExecuteCommand(NewBagFrame.java:165) [bagger-2.7.6.jar:na] at org.springframework.richclient.command.ActionCommand.execute(ActionCommand.java:195) [spring-richclient-support-1.0.0.jar:1.0.0] at org.springframework.richclient.command.ActionCommand$1.actionPerformed(ActionCommand.java:126) [spring-richclient-support-1.0.0.jar:1.0.0] at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022) [na:1.8.0_131] at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348) [na:1.8.0_131] at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402) [na:1.8.0_131] at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259) [na:1.8.0_131] at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:252) [na:1.8.0_131] at java.awt.Component.processMouseEvent(Component.java:6533) [na:1.8.0_131] at javax.swing.JComponent.processMouseEvent(JComponent.java:3324) [na:1.8.0_131] at java.awt.Component.processEvent(Component.java:6298) [na:1.8.0_131] at java.awt.Container.processEvent(Container.java:2236) [na:1.8.0_131] at java.awt.Component.dispatchEventImpl(Component.java:4889) [na:1.8.0_131] at java.awt.Container.dispatchEventImpl(Container.java:2294) [na:1.8.0_131] at java.awt.Component.dispatchEvent(Component.java:4711) [na:1.8.0_131] at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4888) [na:1.8.0_131] at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4525) [na:1.8.0_131] at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4466) [na:1.8.0_131] at java.awt.Container.dispatchEventImpl(Container.java:2280) [na:1.8.0_131] at java.awt.Window.dispatchEventImpl(Window.java:2746) [na:1.8.0_131] at java.awt.Component.dispatchEvent(Component.java:4711) [na:1.8.0_131] at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758) [na:1.8.0_131] at java.awt.EventQueue.access$500(EventQueue.java:97) [na:1.8.0_131] at java.awt.EventQueue$3.run(EventQueue.java:709) [na:1.8.0_131] at java.awt.EventQueue$3.run(EventQueue.java:703) [na:1.8.0_131] at java.security.AccessController.doPrivileged(Native Method) [na:1.8.0_131] at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:80) [na:1.8.0_131] at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:90) [na:1.8.0_131] at java.awt.EventQueue$4.run(EventQueue.java:731) [na:1.8.0_131] at java.awt.EventQueue$4.run(EventQueue.java:729) [na:1.8.0_131] at java.security.AccessController.doPrivileged(Native Method) [na:1.8.0_131] at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:80) [na:1.8.0_131] at java.awt.EventQueue.dispatchEvent(EventQueue.java:728) [na:1.8.0_131] at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:201) [na:1.8.0_131] at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116) [na:1.8.0_131] at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105) [na:1.8.0_131] at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101) [na:1.8.0_131] at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93) [na:1.8.0_131] at java.awt.EventDispatchThread.run(EventDispatchThread.java:82) [na:1.8.0_131] 2017-06-09 11:18:50,877 INFO [AWT-EventQueue-0] g.l.r.b.b.i.DefaultBag [DefaultBag.java:103] gov.loc.repository.bagger.bag.impl.DefaultBag: DefaultBag.init file: null, version: null 2017-06-09 11:18:50,926 INFO [AWT-EventQueue-0] g.l.r.b.u.h.StartNewBagHandler [StartNewBagHandler.java:77] bagProject: 2017-06-09 11:19:00,826 INFO [AWT-EventQueue-0] g.l.r.b.u.h.AddDataHandler [AddDataHandler.java:90] addBagData[0] v7.mov 2017-06-09 11:19:40,231 INFO [Thread-1] g.l.r.b.t.i.DefaultCompleter [DefaultCompleter.java:94] Completing bag at null 2017-06-09 11:28:33,501 INFO [Thread-1] g.l.r.b.b.i.DefaultBag [DefaultBag.java:715] Bag-Info to write: {Payload-Oxum=23936621116.1, Bagging-Date=2017-06-09, Bag-Size=22.3 GB} 2017-06-09 11:28:33,501 INFO [Thread-1] g.l.r.b.w.i.FileSystemWriter [FileSystemWriter.java:214] Writing bag 2017-06-09 11:41:18,148 INFO [AWT-EventQueue-0] g.l.r.b.u.h.SaveBagHandler [SaveBagHandler.java:109] BagView.openExistingBag: /Users/bhoutman/Desktop/bagger test folder/Bagger Test Folder 2017-06-09 11:41:18,167 INFO [AWT-EventQueue-0] g.l.r.b.b.i.DefaultBag [DefaultBag.java:103] gov.loc.repository.bagger.bag.impl.DefaultBag: DefaultBag.init file: null, version: 0.97 2017-06-09 11:41:18,185 INFO [AWT-EventQueue-0] g.l.r.b.b.i.DefaultBag [DefaultBag.java:103] gov.loc.repository.bagger.bag.impl.DefaultBag: DefaultBag.init file: /Users/bhoutman/Desktop/bagger test folder/Bagger Test Folder, version: 0.97 2017-06-09 11:41:18,432 INFO [AWT-EventQueue-0] g.l.r.b.u.BagView [BagView.java:660] Stopped the timer

johnscancella commented 7 years ago

I believe it isn't bagger, but the underlying library that bagger uses. Would you mind trying to create a bag using the java command line implementation? You can find it here: https://github.com/LibraryOfCongress/bagit-java/releases/download/v4.12.2/bagit-4.12.2.zip or use home-brew to install it.

You should then be able to run the command bagit create <dest> <folder(s) you want to bag>.

BenHoutman commented 7 years ago

Thanks John, I tested the suggested bagit java command line scenario on a dummy file and it seemed to work fine--file sizes are the same and everything opens, the bag isn't valid, but it's a .ds_store file (again, user error). Is it possible to tell what went wrong, how I ended up using the wrong library? Best, Ben

On Fri, Jun 16, 2017 at 12:08 PM, John Scancella notifications@github.com wrote:

I believe it isn't bagger, but the underlying library that bagger uses. Would you mind trying to create a bag using the java command line implementation? You can find it here: https://github.com/ LibraryOfCongress/bagit-java/releases/download/v4.12.2/bagit-4.12.2.zip or use home-brew to install it.

You should then be able to run the command bagit create <folder(s) you want to bag>.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/LibraryOfCongress/bagger/issues/44#issuecomment-309067162, or mute the thread https://github.com/notifications/unsubscribe-auth/AcE544wSHDPZraVdlDV-SQBlsyMjl40Lks5sEqh5gaJpZM4N6Wkh .

--

Ben Houtman, Audiovisual Archivist

Merce Cunningham Trust

130 West 56th Street, Suite 707

New York, NY 10019

(212) 255-8240 - office

(512) 767-0288 - cell

bhoutman@mercecunningham.org

www.mercecunningham.org

johnscancella commented 7 years ago

You should test it on the same files that were causing you problems with bagger, otherwise it isn't a valid test.

BenHoutman commented 7 years ago

Hi John, I attempted to run the command line version of bagit java (bagit create

) and encountered the same problem I had before: incomplete files that wouldn't run and the bag checked out as "true" (bagit verify valid), when it was anything but. Again, these files were fine with rsync and a subsequent "bag in place." Any pointers on how to proceed? Best, Ben On Fri, Jun 16, 2017 at 1:36 PM, John Scancella wrote: > You should test it on the same files that were causing you problems with > bagger, otherwise it isn't a valid test. > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > , > or mute the thread > > . > -- Ben Houtman, Audiovisual Archivist Merce Cunningham Trust 130 West 56th Street, Suite 707 New York, NY 10019 (212) 255-8240 - office (512) 767-0288 - cell bhoutman@mercecunningham.org www.mercecunningham.org
johnscancella commented 7 years ago

I would proceed with using rsync and bagit-python