exponential-decay / demystify

Engine for analysis of Siegfried export files and DROID CSV. The tool has three purposes, break the export into its components and store them within a SQLite database; create additional columns to augment the output where useful; and query the SQLite database, outputting results in a readable form useful for analysis by researchers and archivists within digital preservation departments in memory institutions. The tool will find duplicates, unidentified files, blacklisted objects, character encoding issues, and more.
http://www.openplanetsfoundation.org/blogs/2014-06-03-analysis-engine-droid-csv-export
zlib License
23 stars 5 forks source link

Enable classification in results #106

Closed ross-spencer closed 8 months ago

ross-spencer commented 1 year ago

Classification is now being exported by Siegfried, and soon by DROID. We need to capture this in the report here.

SQL:

select classification as class,
count(*) as count
from iddata
group by class
order by count desc;

Example result (OPF format corpus):

class                           count
------------------------------  -----
Text (Mark-up)                  1057 
None                            194  
Page Description                169  
Video                           62   
Image (Raster)                  45   
Word Processor                  36   
Aggregate                       35   
Text (Structured)               27   
Spreadsheet                     20   
Dataset                         9    
Presentation                    6    
Database                        4    
Image (Vector), Text (Mark-up)  1    
Image (Vector)                  1    
GIS, Text (Mark-up)             1    

This might be a good opportunity to rework the queries in AnalysisQueriesClass.py again and creating some helpers to return those.

I think this change will up the version from whenever it is implemented.

ross-spencer commented 8 months ago

Example output via Demystify-Lite.

image