By-pass the tokenizer - Githubissues

By-pass the tokenizer... Well, you can do that, just using ufal.udpipe module directly:

>>> import os,ufal.udpipe,udchinese.udchinese
>>> m=ufal.udpipe.Model.load(os.path.join(udchinese.udchinese.PACKAGE_DIR,"ud-chinese.udpipe"))
>>> udpipe=ufal.udpipe.Pipeline(m,"conllu","","","")
>>> nlp=lambda x:udpipe.process("\n".join("\t".join([str(i+1),j]+["_"]*8) for i,j in enumerate(x.split()))+"\n\n")
>>> doc=nlp("不 入 虎穴 不 得 虎子")
>>> print(doc)
1   不   不   ADV v,副詞,否定,無界  Polarity=Neg    2   advmod  _   _
2   入   入   VERB    v,動詞,行為,移動  _   0   root    _   _
3   虎穴  虎穴  NOUN    n,名詞,固定物,地形 _   2   obj _   _
4   不   不   ADV v,副詞,否定,無界  Polarity=Neg    5   advmod  _   _
5   得   得   VERB    v,動詞,行為,得失  _   2   parataxis   _   _
6   虎子  虎子  NOUN    n,名詞,人,人    _   5   obj _   _

But it seems to cause several bad effects...

KoichiYasuoka / UD-Chinese

By-pass the tokenizer #1