Open garfieldnate opened 3 years ago
@garfieldnate I missed this message! Sorry about that!
Is there anything I can do at this time? Looks like you have stuff going on here https://github.com/garfieldnate/uniunihan-db
Thanks for noticing :D I obviously have a workaround already, but I do still think that a --dictionary
option would make unihan-etl more useful. No worries if you can't get to it, as my workaround is fine for me. Thanks for the great library!
@garfieldnate We can add it, and also make it available via Python API
In the most recent unihan_etl the code I pasted above fails with this error. Not sure if my usage of the API is wrong or if there's an issue in the library.
.venv/lib/python3.11/site-packages/unihan_etl/process.py:531: in export
data = expand_delimiters(data)
.venv/lib/python3.11/site-packages/unihan_etl/process.py:406: in expand_delimiters
char[field] = expansion.expand_field(field, char[field])
.venv/lib/python3.11/site-packages/unihan_etl/expansion.py:416: in expand_field
return expansion_func(fvalue)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
value = [{'radical': 5, 'simplified': False, 'strokes': 10}, "213''.0"]
def _expand_kRSGeneric(value):
pattern = re.compile(
r"""
(?P<radical>[1-9][0-9]{0,2})
(?P<simplified>\'?)\.
(?P<strokes>-?[0-9]{1,2})
""",
re.X,
)
for i, v in enumerate(value):
> m = pattern.match(v).groupdict()
E AttributeError: 'NoneType' object has no attribute 'groupdict'
.venv/lib/python3.11/site-packages/unihan_etl/expansion.py:332: AttributeError
@garfieldnate Thank you!
Does wiping cache and the DB file and rerunning change anything?
That was a really fast response :D
This is actually my bad; the latest unihan_etl already has a fix for this in place, and I mistakenly thought I had updated.
The issue is a typo in the kRSUnicode field for 亀: https://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=%E4%BA%80. It has two apostrophes, which does not follow the syntax specified in the standard. unihan_etl has already updated its parsing to allow the second apostrophe.
I did have to update my code for some unihan_etl changes, but nothing crazy.
@garfieldnate Thank you for the added information. I created an issue in case anyone bumps into this issue to let them know updating works!
I have found that I always need to convert the data into a dictionary (instead of the default list) when I'm using it. Because of this, I decided to always store the file in dictionary format. My method for doing so is a bit hacky, and it would be great to have a
--structure <dict|list>
or even--dictionary
parameter to do this within unihan_etl.Here's my current code. It relies on the undocumented
python
formatting option: