Conal-Tuohy / VMCP-upconversion

Ferdinand von Mueller's correspondence upconversion from MS Word to TEI XML
Apache License 2.0
3 stars 2 forks source link

Finding a set of files without a given style #39

Closed LucasHorseshoeBend closed 7 years ago

LucasHorseshoeBend commented 7 years ago

We have as of today 5302 files set at final They all should have something styled as number, but the analysis shows only 5298 with that style. I can't immediately see a way of finding those files; do you know of a method?

Conal-Tuohy commented 7 years ago

Good question! I am not sure if the current setup does provide a method to find files without a style. I may have to change something in the indexer...

LucasHorseshoeBend commented 7 years ago

Its a bit more complicated than "no style": These "fields" might be in the wrong style! I've come across a couple when editing that have had number as some other style, manually adjusted to give the correct font size and position!! It looks to me as if there are four "final" files that do not have number styled as such, and (perhaps the same) four that do not have anything styled as letter

LucasHorseshoeBend commented 7 years ago

There were some where number was styled standard but your analytics allowed those to be picked up. So my guess is that they are coded as one of the other standard styles, styles that need not be present in all letters.

LucasHorseshoeBend commented 7 years ago

I have now found the 8 files that were missing number or letter styles: using successive combinations of facets to isolate the offending file into a set with a small number that needed to be inspected; the biggest such set was 5 files, so became manageble. Time taken for each search depended on how lucky I was in choosing facets, but in all it took less than 30 minutes for each one. So don't try to finesse the indexer.

I think you can probably close this issue now.