Open yoshiask opened 1 year ago
Dear @yoshiask , Adding more dialect variants is on the immediate agenda of the BBAW/DDGLC/FUBerlin team. This requires a lot of scholarly hand work: matching lists of dialect spellings. I will discuss with the project members in how far we can organize and welcome outside collaborators for this task.
Just adding on: I'm a graduate student in linguistic anthropology with a serious interest in Coptic. I've been working on expanding my Bohairic reading ability & would be very glad to help add Bohairic (& other dialect) content to the dictionary if you all decide to accept volunteer outside collaborators.
@BobOffer-Westort - thanks for offering to help! I don't know how soon the lexicon project would be able to integrate outside contributions, but if you would like to contribute to our Bohairic corpus development, we could definitely use help at Coptic Scriptorium. In particular we are working on developing data to train segmentation tools for Bohairic, similar to what we have for Sahidic here:
https://gucorpling.org/coptic-nlp/
If this sounds interesting just let me know, there's lots to do.
This is very interesting. I don't have any real experience in programming of any kind, let alone NLP. If this is something where working on texts and adding markup is what's called for, I'm happy to learn and contribute what I can. If you need NLP background, I don't have the requisite skill set.
-Bob
On 3 Aug 2023, at 11:50, Amir Zeldes @.***> wrote:
@BobOffer-Westort - thanks for offering to help! I don't know how soon the lexicon project would be able to integrate outside contributions, but if you would like to contribute to our Bohairic corpus development, we could definitely use help at Coptic Scriptorium. In particular we are working on developing data to train segmentation tools for Bohairic, similar to what we have for Sahidic here: https://gucorpling.org/coptic-nlp/ If this sounds interesting just let me know, there's lots to do. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
Oh, I definitely meant markup work! Specifically we are starting with segmenting texts, beginning with simpler Bible chapters (synoptic gospels etc.). If you would feel comfortable turning this:
ⲁⲛⲟⲕ ⲉⲧⲁⲓϯⲱⲙⲥ ⲛⲱⲧⲉⲛ ϧⲉⲛⲟⲩⲙⲱⲟⲩ ⲛⲑⲟϥ ⲇⲉ ϥⲛⲁⲉⲙⲥ ⲑⲏⲛⲟⲩ ϧⲉⲛⲟⲩⲡⲛⲉⲩⲙⲁ ⲉϥⲟⲩⲁⲃ .
Into this:
ⲁⲛⲟⲕ ⲉⲧ|ⲁ|ⲓ|ϯ-ⲱⲙⲥ ⲛⲱ|ⲧⲉⲛ ϧⲉⲛ|ⲟⲩ|ⲙⲱⲟⲩ ⲛⲑⲟϥ ⲇⲉ ϥ|ⲛⲁ|ⲉⲙⲥ|ⲑⲏⲛⲟⲩ ϧⲉⲛ|ⲟⲩ|ⲡⲛⲉⲩⲙⲁ ⲉ|ϥ|ⲟⲩⲁⲃ .
That would already be very helpful! And we credit this kind of work on our site for each text of course. If you're interested feel free to e-mail me off GitHub and I can say more about this.
Is your feature request related to a problem? Please describe. The dictionary at the moment is primarily Sahidic words, I would assume since most of Coptic texts are Sahidic. There are many words that have different spellings or are simply unique to Bohairic that are currently missing from the dictionary.
Describe the solution you'd like I would like to see more Bohairic words and alternates in the dictionary.
Describe alternatives you've considered If KELLIA doesn't want to add more Bohairic entries, I could fork this repo and maintain it separately (following the CC-SA and Apache licenses, of course). I would rather not maintain a fork when everyone could benefit from having everything in one place.
Additional context I understand that most of the contributors are scholars, and I certainly am not. However, I have been doing research on Coptic and have been familiar with it for a while, so if allowed I am willing to contribute what I've found.