Open Jacobinski opened 5 months ago
There is an interesting unit test, called test_issue_78
, in the code which seems to be checking for something similar to my issue. Here is the bug-report associated with the test: https://github.com/luoliyan/chinese-support-redux/issues/78.
It's not clear to me that this ever was "correctly" fixed. Running a Git Blame on the test shows that it was added in https://github.com/luoliyan/chinese-support-redux/commit/7a88d51a1ff663a01e05244bead2b7eabd0b670d which seems to be a hack to workaround the issue.
This issue can be reproduced by adding the following new unit test to the class FormatPinyin(Base)
in test_behavior.py
.
def test_issue_55(self):
note = {'Hanzi': '可能', 'Pinyin': 'kěnéng'}
expected = (
'<span class="tone3">kě</span>'
'<span class="tone2">néng</span> '
'<!-- ke neng -->'
)
reformat_transcript(note, 'pinyin', 'pinyin')
self.assertEqual(note['Pinyin'], expected)
reformat_transcript(note, 'pinyin', 'pinyin')
self.assertEqual(note['Pinyin'], expected)
When tested using $ make test
, the test fails with error:
======================================================== FAILURES =========================================================
_______________________________________________ FormatPinyin.test_issue_55 ________________________________________________
self = <tests.test_behavior.FormatPinyin testMethod=test_issue_55>
def test_issue_55(self):
note = {'Hanzi': '可能', 'Pinyin': 'kěnéng'}
expected = (
'<span class="tone3">kě</span>'
'<span class="tone2">néng</span> '
'<!-- ke neng -->'
)
reformat_transcript(note, 'pinyin', 'pinyin')
> self.assertEqual(note['Pinyin'], expected)
E AssertionError: '<span class="tone3">kěn</span><span class="tone2">éng</span> <!-- ken eng -->' != '<span class="tone3">kě</span><span class="tone2">néng</span> <!-- ke neng -->'
E - <span class="tone3">kěn</span><span class="tone2">éng</span> <!-- ken eng -->
E ? - -
E + <span class="tone3">kě</span><span class="tone2">néng</span> <!-- ke neng -->
E ? + +
tests/test_behavior.py:60: AssertionError
Another discovery:
If you go into test_transcribe.py
and add a new test for 可能
in class Transcribe(Base)
the test suite passes.
However, if you add a test for the same word into class SplitTranscript(Base)
then the test suite fails
Based on the above comment, if we look at the fact that the transcribe(['可能'], 'pinyin', 'simp') == ['kě néng']
test passes, and the split_transcript(' kěnéng', 'pinyin') == ['kě néng']
test fails, it seems to suggest that the issue lies in split_transcript
.
It seems like what this function does it attempt to use regular expressions to split the pinyin word kěnéng
into multiple pieces based on initials and finals. Unfortunately, kěn éng
and kě néng
are both valid splits of the word, so it's up to luck if the correct split is chosen for any given word.
Perhaps we can avoid attempting to split the pinyin based on a regular expression and instead always rely on the Hanzi
field as a source of truth?
This is actually kind of tricky to work around since we cannot always rely on the hanzi
field as a source of truth. In many cases, there are multiple valid pinyins for a given hanzi (ex. 大都 has two translations: dàdōu means "for the most part"; dàdū: "metropolis") and if the extension chooses the wrong pinyin, the user should be able to overwrite it without the hanzi acting a source-of-truth. And if the user overwrites it, we can no longer rely on the hanzi
field as a source-of-truth.
I think that the simplest way to work-around the issue for now it to treat the hanzi
as a source-of-truth unless it is changed by the user. In this way, we will avoid having reformat_transcript()
overwrite the translation, avoiding my reported bug. If the user changes the pinyin
, the new value can act as a source-of-truth and we can run reformat_transcript()
against it. This will still have issues for pinyin
with multiple valid regular expression splits, but hopefully it will avoid the number of occurrences of this bug. Additionally, users should be able to manually put spaces in their pinyins to avoid this issue.
Please find my implementation of this fix in https://github.com/Gustaf-C/anki-chinese-support-3/pull/56
Describe the bug
For certain words, such as 宾馆 and 可能, the extension will generate a correctly colored set of pinyin initially, but if the user clicks onto the "Color" field in the card (maybe other fields reproduce this too?), the pinyin coloring will incorrectly change. Please find two demonstration videos attached below.
https://github.com/Gustaf-C/anki-chinese-support-3/assets/14207293/7c20bde3-d591-4abd-ab51-7cd42cc4d04d
https://github.com/Gustaf-C/anki-chinese-support-3/assets/14207293/81cf701b-99c7-4306-9446-e25a28a609ce
Reproduction
Chinese Support 3
add-on and disable all other add-ons. Perform the standard setup procedures from the README document in this GitHub repo to add aChinese (Basic)
card-type to your Anki.Chinese (Basic)
card, enable Chinese Support V3, then copy-and-paste可能
into theHanzi
field. It should auto-populate the other card fields.Pinyin
HTML and verify that it is<span class="tone3">kě</span><span class="tone3">yǐ</span> <!-- ke yi -->
Pinyin
HTML expanded, click on theColor
field and observe that thePinyin
HTML incorrectly changes to<span class="tone3">kěn</span><span class="tone2">éng</span> <!-- ken eng -->
Expected behavior
The
Pinyin
does not change into an incorrect version of itself.Specs