funderburkjim / MWvlex

0 stars 0 forks source link

Corrections re step0 #1

Open funderburkjim opened 10 years ago

funderburkjim commented 10 years ago

This 'issue' describes why certain MW changes were made today.

As described in https://github.com/funderburkjim/MWvlex/blob/master/step0/readme.txt, step0 aims to categorize records of MW which are roots. It does this by matching to one of 5 patterns; the initial number is the frequency of occurrence of the pattern.

6396 K : record contains form <key2>.*?</root></key2>
0620 P : record contains <vlex type="preverb"></vlex>
2088 V : record contains <vlex type="root"></vlex>
1215 N : record contains <vlex>Nom.</vlex> 
0096 X : record contains <vlex type="nhw">  (nhw == not headword)
0000 O : record contains none of above patterns

In the process of refining the original matching criteria, several markup changes were made to MW records.

1a. 7 'kf' verbs. key2 should be recoded like 
 aMSIkf:38:<H3>:K:<key2>aMSI-<root>kf</root></key2>:
 Note all of these are from supplement.
akzilakzIkf:654.1:<H3>:O::<vlex>P.</vlex>
agocarIkf:878.2:<H3>:O::<vlex>P.</vlex>
agraRIkf:1231.1:<H4>:O::<vlex>P.</vlex>
aNgArasAtkf:1659.1:<H3>:O::<vlex>P.</vlex>
atiTIkf:2996.1:<H3>:O::<vlex>P.</vlex>
antarIkf:8273.1:<H1>:O::<vlex>P.</vlex>
grAsapAtrIkf:68581.1:<H3>:O::<vlex>P.</vlex>

1b. 1 'BU' verb. Similarly recode key2.
atiTIBU:3006.1:<H3>:O::<vlex>P.</vlex>
 Also from supplemtn

1c. Other markup errors
anunirjihAna:6537:<H1>:O::<vlex>A1.</vlex>  Add type="nhw" markup

dAmani:91634:<H3>:O::<vlex type="nhw">P.</vlex>
  markup error. P. is  ls for Panini.
vAmoru:191029:<H3>:O::<vlex type="nhw">Nom.</vlex>
vAmorU:191030:<H3>:O::<vlex type="nhw">Nom.</vlex>
  markup error: should be <ab>Nom.</ab>  (Nominative case)
viDmA:196475.1:<H1>:O::<vlex>P.</vlex>
  markup error: Should be prefix verb
suhfdadruh:250717:<H3>:O::<vlex type="nhw">Nom.</vlex>
  markup error: should be <ab>Nom.</ab>  (Nominative case)

Other changes:
avavarti:18115:<H1>:O::<vlex>A1.</vlex>
  Change markup to <vlex type="nhw">A1.</vlex> 
kawakawAya:42165.1:<H2>:O::<vlex>P.</vlex> <vlex>A1.</vlex>
  add <vlex>Nom.</vlex> 
nirvfta:107358.1:<H3>:O::<vlex>P.</vlex>
  change <vlex>P.</vlex> to <vlex type="nhw">P.</vlex>
po:129230:<H1>:O::<vlex type="hwinfo">Nom.</vlex>
  change hwinfo to nhw
gasyoun commented 10 years ago

Markup changes to the original, main XML file?

funderburkjim commented 10 years ago

Right.

gasyoun commented 10 years ago

Very valuable markup. To understand it's full strength I'll need a few weeks or months. It's bigger than me.