Closed Captainkirkdawson closed 4 years ago
Similarly would a search for GRANT or GRAENT not be expected to genrate a ucf match for a record with GRA_NT? It does not
Have i missed a line in this thread? How is the search process meant to know that the ? on the end of a name in the dataset should trigger an " all possible combinations of anything like the specified search name" result.
All the ? on a name in the dataset means to me is that the transcriber does not have 100% confidence in their transcription but that GRANT is their best stab, indicated as GRANT?. I would expect the search to just ignore the ? character and rather than only returning GRANT?, to return GRANT (if that was searched for) in the first section of results, and any GRANT? names in the 'Possibles' section.
Eric this has nothing to do with the use of the ? that appeared as a gramatical construct in my original post which I have now removed for clarity! The thread is about what we expect to happen for transcriptions that include ucf. After the latest set of mods by Ben we do as you suugest ignore the ? in the search but include it in the search result. ie a record of GRANT? is retrieved by a search for Grant
My question was to do with expectation for k*k or Grant or she{2,3}er
For kk I would have expected a search for kik or Kirk or Kluck to return a possible ucf match of the kk transcription. The latest code generate the ucf for kk kik but not for kirk or kluck. Clearly the latter 2 should have been shown as * means 1 or more. But we clearly also generate it for 0. Is that a valid expectation. As far as the spec is concerned it is not but should we change the spec.
For Grant I would expect Grannt Graent to show the record and it does.; as is one character. But unlike * it does not on the 0 ie. Grant does not suggest that ucf. It is not required to according to spec
she_{2,3}er does exactly as we would expect from the spect. ie she--er or she---er generate a possible match for that record. sheer and she-er and she----er do not. That is the spec but I am asking about expectation rather than spec.
In summary. and {2,3} work according to spec but he question is that what peoples expectations would be. The * as a ucf is not handled correctly and will be raised as a separate story
To ask testing group for this?
I think this story needs to be a documentation story explaining how the new UCF search works.
Agree on that but there was also the question on expectation. Should be 0-n (currently 0-1) and should _ be 0-1. I believe we agreed that should be 0-n.
I also feel that _ should be 0-1
I feel that * should be 1-n, though can be persuaded that it should be 0-n. _ should be exactly 1, in my opinion.
We'd want to review the UCF instructions to resolve this.
In my view as a Transcriber, _ is defined as meaning positively one character only.
And * means anything ( number or Letter) of any quantity of characters but not none. So how that matches your syntax of 0-n or 0-1 I leave you to translate it
I have written up a document based on the latest documentation at https://docs.google.com/a/freeukgenealogy.org.uk/document/d/14iIoZtEjfN_CgDUwc6X9qjwUQIxsnHK568QseblTNRc/edit?usp=sharing
And would appreciate a review and discussion of whether or not to post it for transcribers. In addition, I've added wording about UCF to the about this search screen when it's in use.
Ben, I like your approach and the way it could work. My suggestions are to do with the way that we tell people. Instead of saying "Myopicvicar" (I term which I am not keen on as it is implies a short sighted or short term planning member of the clergy) say "Server" or "Website". I think that "Simple UCF" (those entries with both types of brackets) should be processed as you suggest, but all the "Wildcard" versions just do not get processed. This makes a clear distinction. The "Wildcard UCF" will only be found when specifying County and Place. Or you could even have an "Advanced Search" option where just the "Wildcard UCF" are displayed when a Wildcard is such as "T*" is used in the search parameters. Do you think you should think up a different word for "Widlcard UCF"? I understand the word "Wildcard" to be the Asterisk or Question Mark that you enter into your Search parameters. Agreed that in UCF they have the same function, but they should not be confused. ED
Ben
I agree with EricD that the use of wildcard in 2 very different contexts is extremely confusing.
I also agree with him that we should not MyopicVicar no one knows what that is or means.
We should restrict the use of wildcard to its meaning in the search not within the UCF where * and _ are UCF special characters
I think the document would be better going from the simple to the more complex. As it is it tends to jump around,
K
@benwbrum @edickens @SteveBiggs This is what I propose to add to the Transcriber Help (the Researcher Help will take some more thinking!).
Please do not use square brackets, [ ], in any of the Forename or Surname fields, unless you need to use the brackets as part of our UCF.
For example, you may be tempted to enter something like "Willam [sic]" or even "[Willam]", just as you you would in a transcription made for your personal use. For the FreeREG database to be easily searchable by a researcher, you need to put "Willam" in the Forename field and then something like "Forename: Willam [sic]" in the Notes field. (Ideally, the comment would go in a Transcriber Notes field, but we do not have this field yet.)
For details of how square brackets and our UCF affects search results, see ... (link to info in Researcher Help — once written!).
Good idea. In fact nothing else should be added to names, for example "Snr." otherwise the search thinks it is a second name. All goes in the notes.
I was going to say the same thing as Eric - the name fields must only contain the proper name(s) with no title, rank, subscript, etc - these must all go in the Notes.
Thanks. The Help already covers the general idea of 'name only', but I will review wording/placement of instructions when I make the updates.
@AlOneill Are we happy with this? If so @benwbrum and @Captainkirkdawson to review possible performance issues prior to implementation..
The rules of the UCF and how we react were first documented by Ben and have been updated in the following document: https://docs.google.com/document/d/14iIoZtEjfN_CgDUwc6X9qjwUQIxsnHK568QseblTNRc/edit#
A test file https://test3.freereg.org.uk/freereg1_csv_files/5e45afbbe9379074c4382daf?locale=en conatins a number of different UCF and could be useful in guiding any testing
@Captainkirkdawson please tell me which parish this file relates to so that I can conduct tests (using the Unique Name feature to identify what I should and shouldn't find, and data on types of records and dates.
@PatReynolds if you follow that link you will see that it takes you to SOMRUNBA (Captainkirk) in Parish Register of St Peter in Runnington of Somerset
Update document to address comments by @PatReynolds. Also added a section on how dates containing UCF will be treated in searches containing a date range https://docs.google.com/document/d/14iIoZtEjfN_CgDUwc6X9qjwUQIxsnHK568QseblTNRc/edit#
Thanks, Kirk that is excellent! I got a bit lost in dates, you can tell, but otherwise great. I've suggested changing the language from talking about 'the researcher' to talking to 'you'. And a suggestion on 'nearby places' (if nearby places doesn't work as I think it does, we need to say that 'nearby places' cannot be selected).
@Captainkirkdawson @PatReynolds I've made a start. It would help me if the unresolved comments — mainly about Dates — in Kirk's document could be dealt with. Thanks!
@AlOneill Have updated the text in my document . Records with dates that contain UCF characters will not be included in the results if a date range is applied to a search. Records that contain UCF characters will be retrieved in a search without a date range
I've made a few suggested changes for clarification and have a question about dates:
Why is UCF not used in a date range search? For example; '162[38]' must between 1620 and 1630 so why can't such a date range search return it?
Thanks @Captainkirkdawson
@all As part of describing the possible misuse of square brackets, I intend to ask researchers to report such problems — this may result in a deluge of error reports, but I don't think we can dodge the issue!
As I work on the text it occurs to me that some of the subtleties of UCF may be lost for anyone who relies on a screen-reader. (Punctuation is not voiced, typically.) Will have to test.
@Captainkirkdawson In the section on the misuse of square brackets, I am a little puzzled by this example as I thought wildcard searches applied only to surnames —
Is the solution to make it an example about the surname, "*JOHN*" ?
Happy for you to make that change
@Captainkirkdawson Ah, just checked that nothing has changed (which is hasn't on t3): there must be 2 letters before a *, so will drop that (surname) example.
Draft Help page ready for review.
There is probably room for improvement, but I reckon the essential information is there.
Will create a new issue to check that info and results are accessible for screen-reader users.
The section being entitled "Interpreting symbols in names and dates" and 3/4 down in the Help and referenced from the sidebar as Symbols in Your Results will never be read. The point is made that If you search a specific place within a county, then we are able to show you any results that could match what you are looking for: we search initially for exact matches and then for any records containing UCF characters that could also match the search name. ie a specific place search will now have an extra section with those extra results. You will NOT get them in a county wide search. This needs to be identified in the paragraphs on Name Variations with links to this new section. (At least in my opinion) As written those sections simply say use wildcard or soundex The content itself is fine.
@Captainkirkdawson Fair point — I think I understand what you mean! Will review wrt your comments.
On reflection, cross-references are also needed for dates. And likely needed for Unique Names listing — I seem to remember that UCF is shown in these lists — but will check.
Moving back to In progress.
Cross-referencing added for Names, Dates and Unique Names.
@AlOneill I am happy for this to be finalized and made ready to deployment.
Help page now ready for deployment.
Deployed to production on 20 June 2020
Priority 21 (1 4 10 6) What are the rules for a ucf match? I would have expected the following to be suggested as a match
<SearchName _id: 58c47930231040110c23cbbf, first_name: "james", lastname: "she{2,3}er", origin: "transcript", role: "g", gender: "m", type: "p">
"name has wildcard" "SHEER" /she.{2,3}er/ "did not add"