Diacritical characters in the Gurmukhi script seem to get treated as breaks in words when entering Punjabi words into the text-to-lexemes input, while words lacking these characters seem to work fine. The text-to-lexemes tool also omits diacritical characters occurring at the end of words when they are passed to Wikidata to create a new lexeme. Some examples:
ਸੋਂਣਾ gets split into ਸੋ and ਣਾ (note that ਂ disappears after ਸੋ)
ਕ਼ਾਨੂੰਗੋ gets split into ਕ਼, ਗੋ, ਨੂ (interestingly, the first diacritical mark on each segment is retained, but the ਾ after ਕ਼ and the ੰ on ਨੁ is lost)
ਡੁੱਬਣੀ gets split into ਡੁ and ਬਣੀ (with ੱ disappearing)
There are some character combinations where this does not happen. For example, ਕ੍ਰੋਧੀ works fine.
Diacritical characters in the Gurmukhi script seem to get treated as breaks in words when entering Punjabi words into the text-to-lexemes input, while words lacking these characters seem to work fine. The text-to-lexemes tool also omits diacritical characters occurring at the end of words when they are passed to Wikidata to create a new lexeme. Some examples:
There are some character combinations where this does not happen. For example, ਕ੍ਰੋਧੀ works fine.