exponential-decay / demystify

Engine for analysis of Siegfried export files and DROID CSV. The tool has three purposes, break the export into its components and store them within a SQLite database; create additional columns to augment the output where useful; and query the SQLite database, outputting results in a readable form useful for analysis by researchers and archivists within digital preservation departments in memory institutions. The tool will find duplicates, unidentified files, blacklisted objects, character encoding issues, and more.
http://www.openplanetsfoundation.org/blogs/2014-06-03-analysis-engine-droid-csv-export
zlib License
23 stars 5 forks source link

ImportError on one of the internationalstrings modules when trying to run analysis #41

Closed andreakb closed 5 years ago

andreakb commented 5 years ago

Hi,

I just tried to run the analysis on a Siegfried CSV export file, and I got the following error message:

akb@debian:~$ python git-repos/droid-siegfried-sqlite-analysis-engine-master/droidsqliteanalysis.py --export ResurectionMen.csv Traceback (most recent call last): File "git-repos/droid-siegfried-sqlite-analysis-engine-master/droidsqliteanalysis.py", line 10, in from libs.DroidAnalysisClass import DROIDAnalysis File "/home/akb/git-repos/droid-siegfried-sqlite-analysis-engine-master/libs/DroidAnalysisClass.py", line 7, in import MsoftFnameAnalysis File "/home/akb/git-repos/droid-siegfried-sqlite-analysis-engine-master/libs/MsoftFnameAnalysis.py", line 8, in from internationalstrings import AnalysisStringsEN as IN_EN ImportError: No module named internationalstrings

It looks to me like the MsoftFnameAnalysis.py module is trying to import a module that doesn't exist in the repo?

Thank you!

andreakb commented 5 years ago

NVM! This was my fault. I'll deleting this now. Sorry!!!

ross-spencer commented 5 years ago

I got all excited I had something to do there for a second!!

(Hope the report works well for you! - I'm considering improvements, like packaging it properly for download but I'm not sure there are many folks using it atm)

andreakb commented 5 years ago

Hahaaa, Sorry! the other thing I'm trying to figure out here is: I have an export from SF, but when I try to run the report, the message says Unknown export type. I have a SF in csv format with sha1 checksums... should I be exporting to something else? No rush! Thank you!

ross-spencer commented 5 years ago

Oh yeah, that's a compatibility thing. I should document that better, but it doesn't work with the SF CSV output because SF wasn't quite outputting easy to manage 2D data, so I use SF's YAML, and then DROID's CSV. To get DROID's CSV from SF there is a swtich:

$ sf -h
Usage of sf:
  -csv
        CSV output format
  -droid
        DROID CSV output format

I'm very excited that you're playing with format reports again!!!

andreakb commented 5 years ago

Thanks for that! I am excited too! Just wait until my brain is at 100% capacity again! I created a sf export in YAML with sha1 (bc when I tried the -droid flag, I got the same error) and now I am getting a new error that I can't suss out:

akb@debian:~/git-repos/droid-siegfried-sqlite-analysis-engine-master$ python droidsqliteanalysis.py --export resmen Traceback (most recent call last): File "droidsqliteanalysis.py", line 132, in main() File "droidsqliteanalysis.py", line 115, in main handleDROIDCSV(args.export, True, args.txt, blacklist, args.rogues, args.heroes) File "droidsqliteanalysis.py", line 79, in handleDROIDCSV analysisresults = handleDROIDDB(dbfilename, blacklist, rogues, heroes) File "droidsqliteanalysis.py", line 71, in handleDROIDDB analysisresults = analysis.runanalysis(rogueanalysis) File "/home/akb/git-repos/droid-siegfried-sqlite-analysis-engine-master/libs/DroidAnalysisClass.py", line 717, in runanalysis analysisresults = self.queryDB() # primary db query functions File "/home/akb/git-repos/droid-siegfried-sqlite-analysis-engine-master/libs/DroidAnalysisClass.py", line 633, in queryDB self.analysisresults.bof_distance, self.analysisresults.eof_distance = self.analysebasis() File "/home/akb/git-repos/droid-siegfried-sqlite-analysis-engine-master/libs/DroidAnalysisClass.py", line 428, in analysebasis bof, eof, filesize = self.getoffs(offs, int(length), int(filesize)) File "/home/akb/git-repos/droid-siegfried-sqlite-analysis-engine-master/libs/DroidAnalysisClass.py", line 453, in getoffs pos = int(basis[0]) ValueError: invalid literal for int() with base 10: '[[014][224733132]]'

ross-spencer commented 5 years ago

I don't know what it was - SF had changed its output slightly, but give it a go now with the new commit, should be a-okay!

richardlehane commented 5 years ago

yes sorry - this bug was me, the basis field had a little update last year (I wasn't expecting anyone to be foolhardy enough to try to parse it :))

On Fri, Apr 12, 2019 at 2:30 AM Ross Spencer notifications@github.com wrote:

I don't know what it was - SF had changed its output slightly, but give it a go now with the new commit, should be a-okay!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/exponential-decay/droid-siegfried-sqlite-analysis-engine/issues/41#issuecomment-482389806, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJm7mmV438YTKiQqhYojLspEec3PLZuks5vf9PDgaJpZM4coI_t .

ross-spencer commented 5 years ago

You know me Richard, I'm nothing but through in my foolhardiness!!! :laughing:

(It was nice to go back into the code actually, I think it's pushed its way closer to the top of my TODO to do some spring cleaning here.)

andreakb commented 5 years ago

Thank you!! It works beautifully with the YAML sf export! However, when I tried with the siegfried droid export option, I get the same Unknown export error. My commands were:

sf -droid -hash sha1 /media/sf_Digital_Collections/ResurectionMen-Staging/ > resmen.csv

and

python droidsqliteanalysis.py --export resmen.csv

ross-spencer commented 5 years ago

Fixed now too @andreakb I must not have tested that much once I started separating the code out. SF just doesn't escape it's CSV fields, but that bit is only needed for the identification. I fix that here: https://github.com/exponential-decay/droid-siegfried-sqlite-analysis-engine/commit/94b98d6078620132a53a08fd87adacd84e129ba1

ross-spencer commented 5 years ago

While I've got your ear @andreakb how important do you think it is that this code be possible to download and install without dependencies? It doesn't seem to get much use, and for those that do use it, maybe it is better to be able to update dependencies dynamically? Also, lots of the code can be pulled apart into separate packages - last night's refactor overhauled the filename analysis component, so I'm going to stick that in a repo of its own today (keep this version going too) and then that can be run standalone.

andreakb commented 5 years ago

Ah interesting idea! I think the one dependency had to download was a python library (now of course I can't remember which one & it's not in my history!), I think if it's not too much more work on your end, and you have time for it, it may g some way to lower the barriers to using the analysis tool. I think, if you were wanting to do another roll out of the tool at some point, and introduce more archivists/digital collections types to the analysis tool, it could be something to add.