buda-base / lucene-bo

Lucene analyzer for Tibetan
Apache License 2.0
12 stars 3 forks source link

Request for new normalization rule for the verb འབྱོན་ to account for over-corrections in our catalog #39

Closed JannTibetan closed 10 months ago

JannTibetan commented 2 years ago

Please allow for བྱོན་ to be read as འབྱོན་ and vice versa

I just saw a reference to a text title written as sku skye myur 'byon gyi de nyid gsal byed and it caught my attention so I looked it up on BUDA. Zero results were returned and I was surprised. Then I tried it with a different verb stem of འབྱོན་ (i.e., བྱོན་) and found the text I wanted; we have 4 versions of it. And to complicate matters even more, the spelling of the title in the text itself has འབྱོན་ but BDRC's catalog records corrected it to བྱོན་ and didn't note both spellings so users are at a disadvantage.

JannTibetan commented 2 years ago

*we have 3 versions of this work http://purl.bdrc.io/resource/WA0XLB34248D51809

JannTibetan commented 2 years ago

Actually we do have 4 versions of the text but one of the versions appears in the Sungbum of a different author and our catalog record notes that it is likely the Gangtang text.

JannTibetan commented 2 years ago

@karmagongde Please add a variant name to the work record that represents the མྱུར་འབྱོན་ spelling. Thanks

eroux commented 2 years ago

Thanks Jann! I'm going to update the text index quite soon so if you think if other normalizations don't hesitate (it's quite rare that we update it, ir's a big operation)

karmagongde commented 2 years ago

Hi @eroux where do I see the outline number of the works on the BUDA? To correct these, I have to check out the outline number of the individual Work number. On tbrc.org, all the outline numbers can find on the URL of a particular work number.

karmagongde commented 2 years ago

Hi @JannTibetan Corrected the typo on the outline node title "sku skye myur 'byon gyi de nyid gsal byed".

JannTibetan commented 2 years ago

Thanks Jann! I'm going to update the text index quite soon so if you think if other normalizations don't hesitate (it's quite rare that we update it, ir's a big operation)

OK I'll try to think of some other normalizations that we might have missed before.

JannTibetan commented 2 years ago

Hi @JannTibetan Corrected the typo on the outline node title "sku skye myur 'byon gyi de nyid gsal byed".

Thanks @karmagongde!

eroux commented 2 years ago

Hi @karmagongde thanks for the change! The tbrc.org outline node IDs are not currently displayed on BUDA, I'll make them accessible to admin users

eroux commented 10 months ago

fixed (not deployed yet)