Punjabi Gurmukhi: diacritical characters interpreted in text-to-lexemes input as word break

Diacritical characters in the Gurmukhi script seem to get treated as breaks in words when entering Punjabi words into the text-to-lexemes input, while words lacking these characters seem to work fine. The text-to-lexemes tool also omits diacritical characters occurring at the end of words when they are passed to Wikidata to create a new lexeme. Some examples:

ਸੋਂਣਾ gets split into ਸੋ and ਣਾ (note that ਂ disappears after ਸੋ)
ਕ਼ਾਨੂੰਗੋ gets split into ਕ਼, ਗੋ, ਨੂ (interestingly, the first diacritical mark on each segment is retained, but the ਾ after ਕ਼ and the ੰ on ਨੁ is lost)
ਡੁੱਬਣੀ gets split into ਡੁ and ਬਣੀ (with ੱ disappearing)

There are some character combinations where this does not happen. For example, ਕ੍ਰੋਧੀ works fine.

fnielsen / ordia

Punjabi Gurmukhi: diacritical characters interpreted in text-to-lexemes input as word break #144