Closed aso2101 closed 7 years ago
Please test by deploying the new .xconf file (located in the data app)
If this is not already the case, presence of
I hope I’m not forgetting anything else important.
Arlo
Le 15 août 2017 à 07:07, Andrew Ollett notifications@github.com<mailto:notifications@github.com> a écrit :
I think that when people search for terms they will want to ignore all of the text-critical markup. Thus for example K6 has the line ghara
The following elements should be "ignored" (i.e., their text content should be combined with the immediately preceding or immediately succeeding text node, up to the space) for the sake of searching:
These should also be combined, since these elements often occur together (e.g.,
If possible, the same kind of behavior should be applied to those elements even when they contain spaces (although this should happen much less frequently and I can't find any examples right now):
For the elements
For the element
Possibly @arlogriffithshttps://github.com/arlogriffiths will have something more to add.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/aso2101/satavahana-inscriptions/issues/60, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAzAE7-nsxL4EmiN_grCdgRLGrmdW2Jcks5sYSeBgaJpZM4O3MWJ.
add <space>
and <milestone>
(<lb>
is already covered)
what about
Le 19 août 2017 à 17:36, Andrew Ollett notifications@github.com<mailto:notifications@github.com> a écrit :
add
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/aso2101/satavahana-inscriptions/issues/60#issuecomment-323530285, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAzAE83JItmh_n_pVEQsPxhNBauBheNmks5sZwDzgaJpZM4O3MWJ.
@arlogriffiths would a <pb>
appear in the middle of a word?
@wsalesky yes in the situations arlo mentioned, <pb>
can appear in the middle of a word.
@aso2101 Okay... adding it tonight.
Added to data repository (branch: https://github.com/aso2101/satavahana-inscriptions-data/tree/issue60). Redeploy to test.
I think that when people search for terms they will want to ignore all of the text-critical markup. Thus for example K6 has the line
ghara<unclear>sa</unclear>
indiv[@type='edition']
, which is rendered as ghara(sa) by the ODD. But if one searches for gharasa, there are no results. In order to find the relevant passage one has to search for ghara*.The following elements should be "ignored" (i.e., their text content should be combined with the immediately preceding or immediately succeeding text node, up to the space) for the sake of searching:
unclear
: e.g.,ghara<unclear>sa</unclear>
should be indexed asgharasa
(K6)supplied
: e.g.,ga<supplied reason="lost">ha</supplied>patino
should be indexed asgahapatino
(Ku21)add
: e.g.,<add place="below">gha</add>riniya
should be indexed asghariniya
(no examples in my corpus yet)del
: e.g.,<del>gha</del>ghariniya
should be indexed asghariniya
(actually I think it is indexed this way anyway, so nothing needs to be done here).These should also be combined, since these elements often occur together (e.g.,
<supplied reason="lost">bha</supplied><unclear>yata</unclear>
should be indexed asbhayata
).If possible, the same kind of behavior should be applied to those elements even when they contain spaces (although this should happen much less frequently and I can't find any examples right now):
de<unclear>ya dha</unclear>ma
should be indexed asdeya
anddhama
For the elements
<choice>
and<app>
, which occur in the corpus relatively often, I am a bit more uncertain. I plan on moving 'inline' apparatus elements to an external apparatus for all of the inscriptions, so theoretically<app>
should not match anything indiv[@type='edition']
. But I think that when one includes the apparatus in the search (I will post a separate issue for this) then all of the elements inside<app>
should be considered potential hits (i.e.,<lem>
,<rdg>
, and<note>
), although the behaviour of the inline elements (<unclear>
and<supplied>
) should be the same as noted above.For the element
<gap>
, I am not sure what to do. Right now, I think that<gap>
just screws up any searches, in the sense thatpu<gap/>ṇa
will probably not match the terms pu, puṇa, puteṇa (which is probably what this stands for), etc. Would be be possible for<gap>
elements to be treated as quasi-wildcards, so that a search term like "puteṇa" would matchpu<gap/>ṇa
?Possibly @arlogriffiths will have something more to add.