exponential-decay / demystify

Engine for analysis of Siegfried export files and DROID CSV. The tool has three purposes, break the export into its components and store them within a SQLite database; create additional columns to augment the output where useful; and query the SQLite database, outputting results in a readable form useful for analysis by researchers and archivists within digital preservation departments in memory institutions. The tool will find duplicates, unidentified files, blacklisted objects, character encoding issues, and more.
http://www.openplanetsfoundation.org/blogs/2014-06-03-analysis-engine-droid-csv-export
zlib License
23 stars 5 forks source link

Gnarly data still causes problems (DROID CSV) #92

Open ross-spencer opened 2 years ago

ross-spencer commented 2 years ago

Some CSVs can still interrupt processing. We handle this now with an error log, but we may want to find a way of proactively identifying the issue when reading the CSV or escaping inserts when there is a problem:

"ID","PARENT_ID","URI","FILE_PATH","NAME","METHOD","STATUS","SIZE","TYPE","EXT","LAST_MODIFIED","EXTENSION_MISMATCH","SHA256_HASH","FORMAT_COUNT","PUID","MIME_TYPE","FORMAT_NAME","FORMAT_VERSION"
"3354","3314","file:/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/badNames/objects/%22quote%22.txt","/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/badNames/objects/""quote"".txt","""quote"".txt","Extension","Done","5","File","txt","2018-08-09T23:42:07","false","6667b2d1aab6a00caa5aee5af8ad9f1465e567abf1c209d15727d57b3e8f6e5f","1","x-fmt/111","text/plain","Plain Text File",""
"3396","3314","file:/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/badNames/objects/%22quote%22/","/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/badNames/objects/""quote""","""quote""","","Done","","Folder","","2018-08-09T23:42:07","false","","","","","",""
"3409","3396","file:/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/badNames/objects/%22quote%22/%40at.txt","/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/badNames/objects/""quote""/@at.txt","@at.txt","Extension","Done","5","File","txt","2018-08-09T23:42:07","false","6667b2d1aab6a00caa5aee5af8ad9f1465e567abf1c209d15727d57b3e8f6e5f","1","x-fmt/111","text/plain","Plain Text File",""
"3399","3396","file:/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/badNames/objects/%22quote%22/control.txt","/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/badNames/objects/""quote""/control.txt","control.txt","Extension","Done","5","File","txt","2018-08-09T23:42:07","false","6667b2d1aab6a00caa5aee5af8ad9f1465e567abf1c209d15727d57b3e8f6e5f","1","x-fmt/111","text/plain","Plain Text File",""
"3435","3314","file:/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/badNames/objects/%22quote.txt","/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/badNames/objects/""quote.txt","""quote.txt","Extension","Done","5","File","txt","2018-08-09T23:42:07","false","6667b2d1aab6a00caa5aee5af8ad9f1465e567abf1c209d15727d57b3e8f6e5f","1","x-fmt/111","text/plain","Plain Text File",""
"3408","3314","file:/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/badNames/objects/%27%20quote.txt","/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/badNames/objects/' quote.txt","' quote.txt","Extension","Done","5","File","txt","2018-08-09T23:42:07","false","6667b2d1aab6a00caa5aee5af8ad9f1465e567abf1c209d15727d57b3e8f6e5f","1","x-fmt/111","text/plain","Plain Text File",""
"3426","3314","file:/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/badNames/objects/%27quote%27.txt","/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/badNames/objects/'quote'.txt","'quote'.txt","Extension","Done","5","File","txt","2018-08-09T23:42:07","false","6667b2d1aab6a00caa5aee5af8ad9f1465e567abf1c209d15727d57b3e8f6e5f","1","x-fmt/111","text/plain","Plain Text File",""
"3382","3314","file:/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/badNames/objects/%27quote%27/","/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/badNames/objects/'quote'","'quote'","","Done","","Folder","","2018-08-09T23:42:07","false","","","","","",""
"3386","3382","file:/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/badNames/objects/%27quote%27/%40at.txt","/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/badNames/objects/'quote'/@at.txt","@at.txt","Extension","Done","5","File","txt","2018-08-09T23:42:07","false","6667b2d1aab6a00caa5aee5af8ad9f1465e567abf1c209d15727d57b3e8f6e5f","1","x-fmt/111","text/plain","Plain Text File",""
"3387","3382","file:/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/badNames/objects/%27quote%27/control.txt","/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/badNames/objects/'quote'/control.txt","control.txt","Extension","Done","5","File","txt","2018-08-09T23:42:07","false","6667b2d1aab6a00caa5aee5af8ad9f1465e567abf1c209d15727d57b3e8f6e5f","1","x-fmt/111","text/plain","Plain Text File",""
"3441","3314","file:/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/badNames/objects/quote%20%22quote%22%20quote.txt","/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/badNames/objects/quote ""quote"" quote.txt","quote ""quote"" quote.txt","Extension","Done","5","File","txt","2018-08-09T23:42:07","false","6667b2d1aab6a00caa5aee5af8ad9f1465e567abf1c209d15727d57b3e8f6e5f","1","x-fmt/111","text/plain","Plain Text File",""
"3113","3096","file:/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/zippedWithBadNames/objects/%22quote%22.zip","/home/ross-spencer/git/test-data/src/sampledata/TestTransfers/zippedWithBadNames/objects/""quote"".zip","""quote"".zip","Signature","Done","344","File","zip","2018-08-09T23:42:07","false","101798294713909b757fae88896ea9ae2795645ac29ea3457a1632eeeb0ee8ea","1","x-fmt/263","application/zip","ZIP Format",""