Closed klosoter closed 2 years ago
We need the SubDomain too. How would we add that to the domain attribute?
domain="DomainNumber:DomainName:SubDomainName"
or domain="SubDomainNumber:DomainName:SubDomainName"
since the subdomain number includes the domain number?
We need the SubDomain too. How would we add that to the domain attribute?
domain="DomainNumber:DomainName:SubDomainName"
ordomain="SubDomainNumber:DomainName:SubDomainName"
since the subdomain number includes the domain number?
I like the idea of putting the number first.
In both cases, the number is first. I'm thinking about which number to use. Example: Domain: "Physical Impact" DomainNumber: 019 SubDomain: "Press" SubDomainNumber: 019005
So, I guess it makes sense to use only the SubDomainNumber since it contains (expands) the DomainNumber. And then, number first:
<w domain="019005:Physical Impact:Press" sdbg"..."/>
But is there a more preferable way to add both domain names?
Ah, that makes more sense. I would think it would be best to simply use the subdomain number alone. Unless we want to have everything there, then I would think you could do domain#:subdomain#
, or just do domain="domain#:domainlabel" subdomain="subdomain#:subdomainlabel"
.
After checking the MARBLE domains for the OT, I noticed that we're only using the subdomain label now in the NT.
I suggest expanding the domain attribute by the domain label, like this:
The format then changes
from this:
domain="subdomainID:subdomainLabel"
domain="001014:Population Centers"
to this:
domain="subdomainID:domainLabel:subdomainLabel"
domain="001014:Geographical Objects and Features:Population Centers"
Yes, please do.
On second thought ... how about:
domain="subdomainID:domainLabel:subdomainLabel"
domain="001014:Population Centers"
Looking at SDBG, I currently see this:
<w role="v"
ref="LUK 1:2!2"
class="verb"
xml:id="n42001002002"
lemma="παραδίδωμι"
normalized="παρέδοσαν"
strong="3860"
number="plural"
person="third"
tense="aorist"
voice="active"
mood="indicative"
head="true"
domain="033017;Teach"
sdbg="παραδίδωμι;33.237;to instruct, to teach">παρέδοσαν</w>
There are two hierarchies in this. 33017 is equivalent to Q Teach (33.224-33.250) in this index - Q is the 17th letter of the alphabet:
https://www.laparola.net/greco/louwnida.php#1
Thus, 033017 contains 33.237. We effectively have 3 levels of domain: 33, 17, 237, where the numbering systems for levels 2 and 3 are independent and overlapping. The "33" part of "33017" and "33.237" has the same meaning.
For today, at least, let's make this:
domain="033017"
ln="33.237"
@klosoter, want me to do this or do you want to take care of it?
If you have the time, please!
It seems that some domain and ln data got lost:
Multiple entries got joined into one:
domain="010002033003" ln="10.24"
should be
domain="010002 033003" ln="10.24 33.19"
And domains with only 3-digits have gotten the value ""
(possibly because they are redundant since ln
also has these digits).
See the new data file here to see how the MARBLE data we use is connected to our nodes.
See new pull request
We plan to construct the data we use from MARBLE in the following way:
//LEXMeaning
s from the lexicon files and group them byLEXReference
, the 'marbleId' so to speak.//LEXSense[@LanguageCode="en"]//Gloss
and Domains from//LEXDomain
and the EntryCode from@EntryCode
. (and possibly the SubDomain from//LEXSubDomain
SDBG-DOMAINS1.XML
Assuming this is all connected, we create the following attributes to our tree words:
domain="DomainNumber:DomainName"
sdbg="Lemma:EntryCode:Glosses"
When there are multiple entries/senses/domains for one word/ref, we concatenate them with "|"