-
When udpipe is run without --parse, it sets the HEAD fields to _, which does not conform to the [CONLL-U format spcification](http://universaldependencies.org/format.html) -- IMHO it should be set to …
-
Word embeddings are increasingly popular and provide nice accuracy gains in most parsers nowadays. I agree that we want to keep things simple (and hence there is a cost to allowing additional resource…
-
Hi,
I am looking for some example udapy code to convert a conllu file into a version in which certain words (like the spanish al in http://universaldependencies.org/format.html#words-tokens-and-emp…
-
A sentence that works in SICK-SANE is
+# text = People are walking
+1 People people NOUN NNS _ 3 nsubj _ NNS|07942152-n|GroupOfPeople=
+2 are be VERB VBP _ 3 aux _ VBP|02604760-v|Entity+
+3 …
-
Great tool, thanks for making it available.
Is there some way of obtaining the hyperparameter settings that were used for the individual pre-trained models available from Lindat?
-
In #273, it was suggested that each sentence in CoNLL-U should have its ID encoded in header (comment) in a standardized way, e.g. `# sent_id = 123`. This issue is about the format of the ID itself (i…
-
The method `finishDocument()` doesn't exist, so the Python example doesn't work.
[This line](https://github.com/ufal/udpipe/blob/master/bindings/python/examples/udpipe_model.py#L62) needs to be remov…
-
Just to clarify (I am not going to put it in the proposal but we will have to decide it later):
Are we going to require that people do word segmentation in Chinese (and Japanese, Thai etc. if these l…
-
Is it possible to run just the tokenizer without segmenter? Of course, if the sentence gets divided into more segments I can merge merge them (calling `addWord()` on the first Ufal::UDPipe::Sentence s…
-
CoNLL-U format currently does not specify how to represent all space characters of the original plain text. It only specifies the `SpaceAfter=No` feature denoting that the current token is not followe…
foxik updated
7 years ago