Open judyboon opened 5 years ago
similar issue with some Serbian characters :
tc.text_analytics.count_ngrams(tc.SArray(['Tuširanje']), method='character')
fails with
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 1: unexpected end of data
because of char š apparently not supported.
interestingly
tc.text_analytics.count_ngrams( tc.SArray([u'Tuširanje']), method='word')
does not fail but returns empty dict
dtype: dict Rows: 1 [{}]
This issue still reproduce with TuriCreate 6.4 (on macOS 10.15 and Python 3.7).
Having observed an error code when using
text_analytics.count_words
to process a SArray. The SArray is like follows:When running
it gives me following error.
TC version: 5.1 python version: 3.6.5