To be able to extract archive formats and identify the contents a mapping is needed to various archive types. x-fmt/263 doesn't map yet.
There are two modes to run Wikidata signatures in. Without PRONOM and with. Without PRONOM we might be missing a greater number of mappings/signatures too, so it will be worth looking at those in time.
NB. Taking PRONOM as a decent canonical reference to understand Siegfried's capability, the PRONOM compatible archive formats are identified as such:
---
siegfried : 1.8.0
scandate : 2020-09-08T21:37:23-04:00
signature : default.sig
created : 2020-09-08T21:32:15-04:00
identifiers :
- name : 'wikidata'
details : 'wikidata-definitions-0.0.4 (2020-09-08, DROID_SignatureFile_V96.xml, container-signature-20200124.xml)'
---
filename : '/home/ross-spencer/git/richardlehane/siegfried/cmd/sf/testdata/wikidata/archives/fmt-289-signature-id-305.warc'
filesize : 832
modified : 2020-07-05T13:53:49-04:00
errors :
matches :
- ns : 'wikidata'
id : 'Q7978505'
format : 'Web ARChive'
URI : 'http://www.wikidata.org/entity/Q7978505'
mime : 'application/warc'
basis : 'extension match warc; byte match at 0, 832'
source : 'PRONOM (Official (fmt/289))'
warning :
---
filename : '/home/ross-spencer/git/richardlehane/siegfried/cmd/sf/testdata/wikidata/archives/fmt-410-signature-id-580.arc'
filesize : 205
modified : 2020-07-05T13:53:49-04:00
errors :
matches :
- ns : 'wikidata'
id : 'Q27824065'
format : 'Internet Archive ARC, version 1.1'
URI : 'http://www.wikidata.org/entity/Q27824065'
mime : 'application/x-internet-archive'
basis : 'extension match arc; byte match at [[0 129] [149 56]]'
source : 'PRONOM (Official (fmt/410))'
warning :
---
filename : '/home/ross-spencer/git/richardlehane/siegfried/cmd/sf/testdata/wikidata/archives/x-fmt-219-signature-id-525.arc'
filesize : 205
modified : 2020-07-05T13:53:49-04:00
errors :
matches :
- ns : 'wikidata'
id : 'Q27824060'
format : 'Internet Archive ARC, version 1.0'
URI : 'http://www.wikidata.org/entity/Q27824060'
mime : 'application/x-internet-archive'
basis : 'extension match arc; byte match at [[0 129] [149 56]]'
source : 'PRONOM (Official (x-fmt/219))'
warning :
---
filename : '/home/ross-spencer/git/richardlehane/siegfried/cmd/sf/testdata/wikidata/archives/x-fmt-263-signature-id-200.zip'
filesize : 65572
modified : 2020-07-05T13:53:49-04:00
errors :
matches :
- ns : 'wikidata'
id : 'Q26211840'
format : 'ZIP archive file format, ISO/IEC 21320–1:2015'
URI : 'http://www.wikidata.org/entity/Q26211840'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211975'
format : 'ZIP archive file format, version 2.1'
URI : 'http://www.wikidata.org/entity/Q26211975'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211965'
format : 'ZIP archive file format, version 2.5'
URI : 'http://www.wikidata.org/entity/Q26211965'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211915'
format : 'ZIP archive file format, version 6.2.1'
URI : 'http://www.wikidata.org/entity/Q26211915'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211936'
format : 'ZIP archive file format, version 5.2'
URI : 'http://www.wikidata.org/entity/Q26211936'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q10394822'
format : 'ZIP archive file format, version 6.3.2'
URI : 'http://www.wikidata.org/entity/Q10394822'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211978'
format : 'ZIP archive file format, version 1.1'
URI : 'http://www.wikidata.org/entity/Q26211978'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211983'
format : 'ZIP archive file format, version 1.0'
URI : 'http://www.wikidata.org/entity/Q26211983'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211905'
format : 'ZIP archive file format, version 6.2.2'
URI : 'http://www.wikidata.org/entity/Q26211905'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211957'
format : 'ZIP archive file format, version 4.5'
URI : 'http://www.wikidata.org/entity/Q26211957'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211940'
format : 'ZIP archive file format, version 5.1'
URI : 'http://www.wikidata.org/entity/Q26211940'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211891'
format : 'ZIP archive file format, version 6.3.0'
URI : 'http://www.wikidata.org/entity/Q26211891'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q5532250'
format : 'General Transit Feed Specification'
URI : 'http://www.wikidata.org/entity/Q5532250'
mime :
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211927'
format : 'ZIP archive file format, version 6.2.0'
URI : 'http://www.wikidata.org/entity/Q26211927'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211874'
format : 'ZIP archive file format, version 6.3.1'
URI : 'http://www.wikidata.org/entity/Q26211874'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211977'
format : 'ZIP archive file format, version 2.0'
URI : 'http://www.wikidata.org/entity/Q26211977'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211954'
format : 'ZIP archive file format, version 4.6'
URI : 'http://www.wikidata.org/entity/Q26211954'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211958'
format : 'ZIP archive file format, version 2.7'
URI : 'http://www.wikidata.org/entity/Q26211958'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211931'
format : 'ZIP archive file format, version 6.1.0'
URI : 'http://www.wikidata.org/entity/Q26211931'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211948'
format : 'ZIP archive file format, version 5.0'
URI : 'http://www.wikidata.org/entity/Q26211948'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
---
filename : '/home/ross-spencer/git/richardlehane/siegfried/cmd/sf/testdata/wikidata/archives/x-fmt-265-signature-id-265.tar'
filesize : 156
modified : 2020-07-05T13:53:49-04:00
errors :
matches :
- ns : 'wikidata'
id : 'Q283579'
format : 'tar'
URI : 'http://www.wikidata.org/entity/Q283579'
mime : 'application/x-tar'
basis : 'extension match tar; byte match at 0, 156 (signature 2/2)'
source : 'PRONOM (Official (x-fmt/265))'
warning :
---
filename : '/home/ross-spencer/git/richardlehane/siegfried/cmd/sf/testdata/wikidata/archives/x-fmt-266-signature-id-201.gz'
filesize : 3
modified : 2020-07-05T13:53:49-04:00
errors :
matches :
- ns : 'wikidata'
id : 'Q10287816'
format : 'GZIP'
URI : 'http://www.wikidata.org/entity/Q10287816'
mime : 'application/gzip'
basis : 'extension match gz; byte match at 0, 3 (signature 1/2); byte match at 0, 3 (signature 2/2)'
source : 'Gary Kessler''s File Signature Table (source date: 2017-08-07) PRONOM (Official (x-fmt/266))'
warning :
And then without PRONOM attached to the Wikidata identifier:
---
siegfried : 1.8.0
scandate : 2020-09-08T21:43:39-04:00
signature : default.sig
created : 2020-09-08T21:43:23-04:00
identifiers :
- name : 'wikidata'
details : 'wikidata-definitions-0.0.4 (2020-09-08)'
---
filename : '/home/ross-spencer/git/richardlehane/siegfried/cmd/sf/testdata/wikidata/archives/fmt-289-signature-id-305.warc'
filesize : 832
modified : 2020-07-05T13:53:49-04:00
errors :
matches :
- ns : 'wikidata'
id : 'Q84037847'
format : 'WARC 1.1'
URI : 'http://www.wikidata.org/entity/Q84037847'
mime : 'application/warc'
basis : 'extension match warc'
source :
warning :
- ns : 'wikidata'
id : 'Q7978505'
format : 'Web ARChive'
URI : 'http://www.wikidata.org/entity/Q7978505'
mime : 'application/warc'
basis : 'extension match warc'
source :
warning :
---
filename : '/home/ross-spencer/git/richardlehane/siegfried/cmd/sf/testdata/wikidata/archives/fmt-410-signature-id-580.arc'
filesize : 205
modified : 2020-07-05T13:53:49-04:00
errors :
matches :
- ns : 'wikidata'
id : 'Q28600246'
format : 'FreeArc ARC'
URI : 'http://www.wikidata.org/entity/Q28600246'
mime :
basis : 'extension match arc'
source :
warning :
- ns : 'wikidata'
id : 'Q28600250'
format : 'Oracle database backup format'
URI : 'http://www.wikidata.org/entity/Q28600250'
mime :
basis : 'extension match arc'
source :
warning :
- ns : 'wikidata'
id : 'Q28600238'
format : 'ARC'
URI : 'http://www.wikidata.org/entity/Q28600238'
mime :
basis : 'extension match arc'
source :
warning :
- ns : 'wikidata'
id : 'Q27824065'
format : 'Internet Archive ARC, version 1.1'
URI : 'http://www.wikidata.org/entity/Q27824065'
mime : 'application/x-internet-archive'
basis : 'extension match arc'
source :
warning :
- ns : 'wikidata'
id : 'Q296496'
format : 'ARC'
URI : 'http://www.wikidata.org/entity/Q296496'
mime : 'application/x-ia-arc; application/octet-stream'
basis : 'extension match arc'
source :
warning :
- ns : 'wikidata'
id : 'Q27824060'
format : 'Internet Archive ARC, version 1.0'
URI : 'http://www.wikidata.org/entity/Q27824060'
mime : 'application/x-internet-archive'
basis : 'extension match arc'
source :
warning :
---
filename : '/home/ross-spencer/git/richardlehane/siegfried/cmd/sf/testdata/wikidata/archives/x-fmt-219-signature-id-525.arc'
filesize : 205
modified : 2020-07-05T13:53:49-04:00
errors :
matches :
- ns : 'wikidata'
id : 'Q28600246'
format : 'FreeArc ARC'
URI : 'http://www.wikidata.org/entity/Q28600246'
mime :
basis : 'extension match arc'
source :
warning :
- ns : 'wikidata'
id : 'Q28600250'
format : 'Oracle database backup format'
URI : 'http://www.wikidata.org/entity/Q28600250'
mime :
basis : 'extension match arc'
source :
warning :
- ns : 'wikidata'
id : 'Q28600238'
format : 'ARC'
URI : 'http://www.wikidata.org/entity/Q28600238'
mime :
basis : 'extension match arc'
source :
warning :
- ns : 'wikidata'
id : 'Q27824065'
format : 'Internet Archive ARC, version 1.1'
URI : 'http://www.wikidata.org/entity/Q27824065'
mime : 'application/x-internet-archive'
basis : 'extension match arc'
source :
warning :
- ns : 'wikidata'
id : 'Q296496'
format : 'ARC'
URI : 'http://www.wikidata.org/entity/Q296496'
mime : 'application/x-ia-arc; application/octet-stream'
basis : 'extension match arc'
source :
warning :
- ns : 'wikidata'
id : 'Q27824060'
format : 'Internet Archive ARC, version 1.0'
URI : 'http://www.wikidata.org/entity/Q27824060'
mime : 'application/x-internet-archive'
basis : 'extension match arc'
source :
warning :
---
filename : '/home/ross-spencer/git/richardlehane/siegfried/cmd/sf/testdata/wikidata/archives/x-fmt-263-signature-id-200.zip'
filesize : 65572
modified : 2020-07-05T13:53:49-04:00
errors :
matches :
- ns : 'wikidata'
id : 'Q26211948'
format : 'ZIP archive file format, version 5.0'
URI : 'http://www.wikidata.org/entity/Q26211948'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211978'
format : 'ZIP archive file format, version 1.1'
URI : 'http://www.wikidata.org/entity/Q26211978'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211975'
format : 'ZIP archive file format, version 2.1'
URI : 'http://www.wikidata.org/entity/Q26211975'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q5532250'
format : 'General Transit Feed Specification'
URI : 'http://www.wikidata.org/entity/Q5532250'
mime :
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211874'
format : 'ZIP archive file format, version 6.3.1'
URI : 'http://www.wikidata.org/entity/Q26211874'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211940'
format : 'ZIP archive file format, version 5.1'
URI : 'http://www.wikidata.org/entity/Q26211940'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211977'
format : 'ZIP archive file format, version 2.0'
URI : 'http://www.wikidata.org/entity/Q26211977'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211840'
format : 'ZIP archive file format, ISO/IEC 21320–1:2015'
URI : 'http://www.wikidata.org/entity/Q26211840'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211957'
format : 'ZIP archive file format, version 4.5'
URI : 'http://www.wikidata.org/entity/Q26211957'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211927'
format : 'ZIP archive file format, version 6.2.0'
URI : 'http://www.wikidata.org/entity/Q26211927'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211965'
format : 'ZIP archive file format, version 2.5'
URI : 'http://www.wikidata.org/entity/Q26211965'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211936'
format : 'ZIP archive file format, version 5.2'
URI : 'http://www.wikidata.org/entity/Q26211936'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211958'
format : 'ZIP archive file format, version 2.7'
URI : 'http://www.wikidata.org/entity/Q26211958'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211915'
format : 'ZIP archive file format, version 6.2.1'
URI : 'http://www.wikidata.org/entity/Q26211915'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q10394822'
format : 'ZIP archive file format, version 6.3.2'
URI : 'http://www.wikidata.org/entity/Q10394822'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211905'
format : 'ZIP archive file format, version 6.2.2'
URI : 'http://www.wikidata.org/entity/Q26211905'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211891'
format : 'ZIP archive file format, version 6.3.0'
URI : 'http://www.wikidata.org/entity/Q26211891'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211931'
format : 'ZIP archive file format, version 6.1.0'
URI : 'http://www.wikidata.org/entity/Q26211931'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211954'
format : 'ZIP archive file format, version 4.6'
URI : 'http://www.wikidata.org/entity/Q26211954'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
- ns : 'wikidata'
id : 'Q26211983'
format : 'ZIP archive file format, version 1.0'
URI : 'http://www.wikidata.org/entity/Q26211983'
mime : 'application/zip'
basis : 'extension match zip'
source :
warning :
---
filename : '/home/ross-spencer/git/richardlehane/siegfried/cmd/sf/testdata/wikidata/archives/x-fmt-265-signature-id-265.tar'
filesize : 156
modified : 2020-07-05T13:53:49-04:00
errors :
matches :
- ns : 'wikidata'
id : 'Q283579'
format : 'tar'
URI : 'http://www.wikidata.org/entity/Q283579'
mime : 'application/x-tar'
basis : 'extension match tar'
source :
warning :
---
filename : '/home/ross-spencer/git/richardlehane/siegfried/cmd/sf/testdata/wikidata/archives/x-fmt-266-signature-id-201.gz'
filesize : 3
modified : 2020-07-05T13:53:49-04:00
errors :
matches :
- ns : 'wikidata'
id : 'Q10287816'
format : 'GZIP'
URI : 'http://www.wikidata.org/entity/Q10287816'
mime : 'application/gzip'
basis : 'extension match gz; byte match at 0, 3'
source : 'Gary Kessler''s File Signature Table (source date: 2017-08-07)'
warning :
Description of problem
To be able to extract archive formats and identify the contents a mapping is needed to various archive types. x-fmt/263 doesn't map yet.
There are two modes to run Wikidata signatures in. Without PRONOM and with. Without PRONOM we might be missing a greater number of mappings/signatures too, so it will be worth looking at those in time.
NB. Taking PRONOM as a decent canonical reference to understand Siegfried's capability, the PRONOM compatible archive formats are identified as such:
And then without PRONOM attached to the Wikidata identifier: