Dixin / Etymology

Chinese etymology research website. ASP.NET Core architecture for SPA. See http://hanziyuan.net or http://ChineseEtymology.org
http://hanziyuan.net
12 stars 2 forks source link

Test Character Cases #83

Closed UncleHanzi closed 6 years ago

UncleHanzi commented 6 years ago

Special cases of characters in the Etymology table

Case 𪛖 extension B character not in Etymology table Input works OK !Crashes on search

Case 𠀀 Extension B character in Etymology table Found and displayed correct data on search

Case 㐦 Extension A character not in Etymology table !Crashes on search

Case 䜩 Simplified character in Etymology table - OK

Case 讌 Traditional character in Etymology table - OK

Case 孃 Old Traditional OK

Case ㇏ Some “Chinese characters” come from Unicode code ranges other than Unihan, Extension A and Extension B, for example “strokes” I have included a list of Unicode ranges which should be covered CJK Strokes: (U+31C0 to U+31EF)
Not detected as legitimate input character.

Case Simplified column starting with “p” I see you took care of the references to characters in the Simplified column starting with “p” “p” means “part of a character” I have other special characters such as Cantonese starting with “c” but I can take care of that on the data side. 飤 starts with “p” simplified shows blank, OK I will fix all these problems on the data side don’t worry about them. 踫 r problem 宁 z 軚 c problem

Case 㙜 old traditional with more than 1 character bombs Under Old Traditional, I sometimes have more than one character. I think you should just show all the Old traditional I have, maybe 0 to 5 old tratitional.

Case 臺 1-n simplified OK The simplified form of 臺 is 台 which, since the input is 臺, it is specified uniquely even though 台 has 1-n relation. 臺 is OK

Case 台 臺 1-n simplified only gets first one. 台 has a 1-4 relation to traditional characters. When it is derived from 台, I just have 台 in the simplified column and I show the etymology of 台. If it is derived from something else such as 臺 I have a 台1, 台2, or 台3 in the simplified column. I will open a separate bug for this.

Case 綎 /⿰纟廷 problem My simplified characters come from the 2013 government standard with 8105 characters. For some reason they have selected some bizarre characters for which Unicode currently has no simplified character or code point. These characters are specified like “⿰纟廷” I currently preface these characters with a “/” so I can easily search for them. I think the Simplified display should show “⿰纟廷” when I input the traditional character which does exist 綎

Case 臤 /t S Special problem, I will think about this.

Dixin commented 6 years ago

Added automated test cases. Will also add these to wiki documentation.