Lguyogiro / low-resource-polysynthetic-morphology

Morphological segmentation
1 stars 0 forks source link

Data examples #1

Closed lwahomura closed 4 years ago

lwahomura commented 4 years ago

Could you please provide some test data? I wanted to try your segmenter, though it'd be easier if there was some data to see the required format.

Lguyogiro commented 4 years ago

Hello! Here is the link to the data from the original paper that this repo is replicating: The source code from the paper is there also, but to be honest I found it a little challenging to use as is, which is why I started this repo!

the format is one word per line, with space-separated characters. The "!" is used to mark the morpheme boundary.

For source data:

o i n k i p a n t i
t l a w a l
t i w e
s k a k o k w i

target:

o ! i n ! k ! i p a n t i
t l a w a l
t ! i w e
s ! k ! a k o k w i
lwahomura commented 4 years ago

Thanks a lot! This came to be very helpful and the metrics got much better!

Lguyogiro commented 4 years ago

Great I'm glad to hear that! Sorry that the code is not thoroughly documented, let me know if there's anything I can help clarify.