benini / scid

Other
43 stars 12 forks source link

Bugfix comma removed for lastname firstname #158

Closed ukscid closed 7 months ago

ukscid commented 9 months ago

The case "useNAMEL_" did not work because the comma in lname

benini commented 8 months ago

Can you please explain with an example the bug? Can you please try to not change the behavior of existing functions? If possible I would like to not have to track all the usage of the function ::pinfo::splitName and check if this modifications are ok.

ukscid commented 8 months ago

The resolver has the options {useNAMEL} and {useNAMEF} or {useNAMEL+} and {useNAMEF+}

this should generate firstname_lastname and lastname_firstname or firstname+lastname and lastname+firstname

splitName creates

"firstname lastname" and "lastname,firstname"

so splitname works as designed, but this breaks the logic in

if { [string index $searchterm 7] eq "L" } { set psname $lname } else { set psname $fname }
regsub -all " " $psname [string index $searchterm 8] psname

because then "lastname,firstname" does not convert to "lastname_firstname" and the resolver does not hit the right url. The first commit should fix this situation. I can't see why "firstname lastname" and "lastname firstname" should handle different in splitname.

The second commit tries to read the name from the spellcheck and use this. This should increase the probability to hit the right link, For example: without spellcheck a name like "Carslen, M" does not hit the wikipedia link https://de.wikipedia.org/wiki/Magnus_Carlsen An other option may be to return the spellcheck name in {sc_name info -htext $player}

splitname are only used in pinfo.tcl I hope this help you to decide if this changes are valid.

benini commented 8 months ago

I quickly looked at the code, but I stil don't get it. ::pinfo::ReplaceIDTags try to infer an URL from the player name. I understand that usually you have the name in 2 format: Magnus Carlsen or Carlsen, Magnus and the comma signals that the lastname is first. But it became more complex when there are more than 2 words: Maxime Vachier-Lagrave Vachier-Lagrave, Maxime or Hans Moke Niemann Niemann, Hans Moke

Can you please give me some tests that cover the various corner cases and that explain the problem? set pinfo "Some player info for Carlsen, Magnus" set expected "Something with URL" set result [::pinfo::ReplaceIDTags $pinfo] if {$result ne $expected} error $pinfo

ukscid commented 8 months ago

Here the test case: Use set idlink(B) {{useNAMEL_} {https://players.chessbase.com/de/player/%ID%} {} {CB}} in resolvers.dat. This say: create a ID with Lastname Firstname with _ as delimiter

The chessbase webpage needs: "https://players.chessbase.com/de/player/lastname_firstname" For Magnus Carlsen the URL is https://players.chessbase.com/de/player/Carlsen_Magnus The testset is five different spellings for Magnus Carlsen. The correct URLs are marked with an * below.

First test with the actual splitname results in Carlsen, Magnus -> https://players.chessbase.com/de/player/Carlsen,Magnus Magnus Carlsen -> https://players.chessbase.com/de/player/Magnus,Carlsen Carlsen Magnus -> https://players.chessbase.com/de/player/Carlsen,Magnus Carlsen, M. -> https://players.chessbase.com/de/player/Carlsen,M. Carlsen M. -> https://players.chessbase.com/de/player/Carlsen,M. There are no right URL. The delimiter _ is not used. It is ok that testcases without the full name have the wrong URL.

Second test: for the first ignore/remove the changes in ::pinfo::ReplaceIDTags and test the change from set lname "[string range $playerName 0 [expr $count - 1]],[string range $playerName [expr $count + $countlen] end]" to set lname "[string range $playerName 0 [expr $count - 1]] [string range $playerName [expr $count + $countlen] end]" in splitname.

Now the results are Carlsen, Magnus -> https://players.chessbase.com/de/player/Carlsen_Magnus Magnus Carlsen -> https://players.chessbase.com/de/player/Magnus_Carlsen Carlsen Magnus -> https://players.chessbase.com/de/player/Carlsen_Magnus Carlsen, M. -> https://players.chessbase.com/de/player/Carlsen_M. Carlsen M. -> https://players.chessbase.com/de/player/Carlsen_M.

In my opinion this should be the right result for the resolver, and we get 2 correct URLs. It is ok, that testcases without the full name or the form "firstname lastname" have the wrong URL.

Third test: Now we add the use of the spellcheck file in ::pinfo::ReplaceIDTags. With both changes, the result is Carlsen, Magnus -> https://players.chessbase.com/de/player/Carlsen_Magnus Magnus Carlsen -> https://players.chessbase.com/de/player/Carlsen_Magnus Carlsen Magnus -> https://players.chessbase.com/de/player/Carlsen_Magnus Carlsen, M. -> https://players.chessbase.com/de/player/Carlsen_Magnus Carlsen M. -> https://players.chessbase.com/de/player/Carlsen_Magnus *

All inputs generate the wanted URL. The hit rate increases when the spellcheck file have unambiguous names. It may be discussed wether this behaviour is a wanted result for the input.

You are right: all this can not handle names like Niemann, Hans Moke This depends from the wanted URL, for example: Wikipedia works: https://en.wikipedia.org/wiki/Hans_Moke_Niemann Chessbase does not work: https://players.chessbase.com/de/player/Niemann_Hans_Moke Chessbase wants: "https://players.chessbase.com/de/player/Niemann_Hans Moke" (keeps the second blank)

Names like "Vachier-Lagrave, Maxime" are not a problem as long as the delimiter is not an "-".

ukscid commented 7 months ago

trimEngineName results in "carlsen,magnus". Some websites are case sensitive (e.q. Wikipedia: https://en.wikipedia.org/wiki/Magnus_Carlsen ) Would it be ok, to make the resolver string more complex to use it in ::pinfo::formatName ? And rules for 3 names could also be calculated. Other option: use the spellcheck name. This has the correct uppercase.

benini commented 7 months ago

Usually the aim is to reduce complexity, not to increase it. I would try something like:

proc formatName{fname lname swap_order delimiter} {
  if {$swap_order eq "L"} { swap fname with lname }
  return "[string totitle $fname]$delimiter[string totitle $lname]"
}

if {[regexp {useNAME([FL])(.)?$} $searchterm -> swap_order delimiter]} {
  splitName ...
  formatName ...
}

and then list what is missing. If the matter is just about lowercases we can add something like useNAMEf.

ukscid commented 7 months ago

The proposed solution does not work for names like: "Praggnanandhaa R" The trimEngineName function turns it into "praggnanandhaar". This can not be used for first name and surname. trimEngineName was not intended to differentiate between first and last name. I have implemented and checked in the formatName function as suggested. This works well for "normal" names. What should the design of the solution look like now? Modify trimEngineName?

The result of trimEngineName depends on the installed photos. If no photos installed then "pra" is the result. Possible solution: create new proc getSpellCheckname use this in playerinfo and trimEngineName move trimEngineName from updatePlayerPhotos to getPhoto

benini commented 7 months ago

Please try if these changes work.

ukscid commented 7 months ago

All tests passed! Interesting solution, instead of two functions, one function returns both values.