ffdev-info / wikidp-issues

An issues repository for resolving issues in Wikidata around the records relating to Digital Preservation
GNU General Public License v3.0
1 stars 0 forks source link

Text ID value isn't configured in Wikidata identifier #26

Open ross-spencer opened 2 years ago

ross-spencer commented 2 years ago

Description of problem

I believe SF will try to identify the encoding of a text file if SF finds a matching PUID/Identifier etc. The text ID is recorded in the config package, e.g. PRONOM or MIMEInfo let's investigate the impact for Wikidata not having this ID and see what we need to include it.

ross-spencer commented 1 year ago

@richardlehane does this issue make sense to you? If so, I'm tempted to translate it to an issue on the SF wiki and then work on that at some point soon.

richardlehane commented 1 year ago

this is something I think you could implement for wikidata.

The text matcher isn't triggered by PUID or other identifier: it will run as the last matcher unless all the identifiers report they are already satisfied (e.g. because a byte match or container match has been successfully made).

For the PRONOM identifier, this means that it will run if A) you have no match at all (this is how it can identify e.g. a README file with no extension as text) or B) if you have an extension or mime match for a text format (in which case it will validate whether it is actually text or not)