Conal-Tuohy / VMCP-upconversion

Ferdinand von Mueller's correspondence upconversion from MS Word to TEI XML
Apache License 2.0
3 stars 2 forks source link

Strange behavior in Addressee facet #45

Closed LucasHorseshoeBend closed 3 years ago

LucasHorseshoeBend commented 6 years ago

Another issue for when you are funded again for vmcp work.

I have been using the addressee facet to help clean up errors in the correspondent field.

When I facet by "addressee" within "final" it returns some addressees as "the", (today, 172 files). But when I click on the "the" facet link, it puts all of the final files (today 6151) in the selected set.

This must be an algorithm problem. There are cases which would show up that way. An example file that will be reported in this way is http://vmcp.conaltuohy.com/xtf/view?docId=tei/1860-9/1867/67-02-00d-final.xml The fact that it is reported in this faceted set is not the issue: we could attend to it and others like it if we could easily find them.

Another oddity is that when I click on the addressee reported as "[?]", which suggests that there is one such item, the set conains 18 items, the correct one http://vmcp.conaltuohy.com/xtf/view?docId=tei/1890-6/1894/94-07-00g-final.xml and 17 which are reported as "[...]", such as http://vmcp.conaltuohy.com/xtf/view?docId=tei/1860-9/1865/65-09-00a-final.xml whereas there are only 3 that are reported as "[...]", including http://vmcp.conaltuohy.com/xtf/view?docId=tei/1860-9/1865/65-09-00a-final.xml

All this suggests that there is more than one condition where the faceting algorithm is behaving oddly. These facets are extremely helpful in cleaning the files, but I need to have confidence in them.

Conal-Tuohy commented 3 years ago

I believe this is fixed now. There's none with "the" and only one with "[...]" which on inspection is a letter whose source is unknown. Is it OK to close this issue now?

LucasHorseshoeBend commented 3 years ago

Dear Conal

Close issue.

The number of ... is not consistent, "[...]" I get more than one, but only two in final ones; some will disappear with more work on those not yet finalised.

I will discuss with Rod whether we should say 'Unknown" which would be more transparent than [...], but might have other consequences I have not considered.

Best wishes Arthur

On 17 Feb 2021, at 06:25, Conal Tuohy notifications@github.com wrote:

I believe this is fixed now. There's none with "the" and only one with "[...]" which on inspection is a letter whose source is unknown. Is it OK to close this issue now?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Conal-Tuohy/VMCP-upconversion/issues/45#issuecomment-780334079, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF3IGTUQIQYCBAXLGBGIFTDS7NOMVANCNFSM4EK7ZB4Q.

LucasHorseshoeBend commented 3 years ago

Dear Conal Close issue.

I had four reported as [...], all not final. I will discuss with Rod whether we should say 'Unknown" which would be more transparent than [...], bu might have other consequences I have not considered.

Best wishes Arthur

On 17 Feb 2021, at 06:25, Conal Tuohy notifications@github.com wrote:

I believe this is fixed now. There's none with "the" and only one with "[...]" which on inspection is a letter whose source is unknown. Is it OK to close this issue now?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Conal-Tuohy/VMCP-upconversion/issues/45#issuecomment-780334079, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF3IGTUQIQYCBAXLGBGIFTDS7NOMVANCNFSM4EK7ZB4Q.