Closed JessedeDoes closed 6 years ago
Oops: this is because there is no white space between the tags in the source file
Actually, the word order is confused here. The content in the [name] tag appears after the rest of the sentence Maybe my indexing specification is to blame?
default namespace) # What element starts a new document? # (the only absolute XPath; the rest is relative) documentPath: //TEI|//TEI.2 # Annotated, CQL-searchable fields (also called "complex fields"). # We usually have just one, named "contents". annotatedFields: contents: # How to display the field in the interface (optional) displayName: Contents # How to describe the field in the interface (optional) description: Contents of the documents. # What element (relative to document) contains this field's contents? # (if omitted, entire document is used) containerPath: .//body # What are our word tags? (relative to container) wordPath: .//w|.//pc # (body geldt niet voor OpenSonar, maar ter illustratie) # Punctuation between word tags (relative to container) punctPath: .//text()[not(ancestor::w or ancestor::pc)] # = "all text nodes (under containerPath) not inside aelement" # What annotation can each word have? How do we index them? # (annotations are also called "(word) properties" in BlackLab) # (valuePaths relative to word path) # NOTE: forEachPath is NOT allowed for annotations, because we need to know all annotations before indexing, # and with forEachPath you could run in to an unknown new annotation mid-way through. annotations: - name: word valuePath: . - name: lemma valuePath: "@lemma" - name: pos valuePath: "@pos" - name: morfcode valuePath: "@type" # XML tags within the content we'd like to index # (relative to container) inlineTags: - path: .//s #call: openSonarSentence # to call a plugin method for this tag - path: .//p - path: .//name # FoLiA's native metadata metadata: containerPath: //listBibl[@type='metadata'] fields: - forEachPath: bibl/interpGrp/interp namePath: ../@type # interpGrp/@type valuePath: . # interp/@value ]]>
Sorry, included yaml is a mess
Strange. I suspect the difference in nesting level of word tags, combined with how vtd-xml returns matches, is to blame. We'll investigate.
https://portal.clarin.inl.nl/atocorp/j.de.does@umail.leidenuniv.nl:EindhovenTest3/search/hits?number=20&first=0&patt=%5Bpos%3D%22SPEC.%2Adeel.%2A%22%5D
XML:
[s xmlns="http://www.tei-c.org/ns/1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:ivdnt="http://www.ivdnt.org/xslt/namespaces" xmlns:tei="http://www.tei-c.org/ns/1.0"] [w pos="VG(onder)" lemma="aangezien" type="710" xml:id="w.0"]Aangezien[/w] [w pos="LID(bep,stan)" lemma="de" type="370" xml:id="w.1"]de[/w] [name key="wet-administratieve-rechtspraak-overheidsbeschikkingen" resp="namenKlus"] [w pos="SPEC(deeleigen)" xml:id="w.2.part.0" type="010"]Wet[/w] [w pos="SPEC(deeleigen)" xml:id="w.2.part.1" type="010"]Administratieve[/w] [w pos="SPEC(deeleigen)" xml:id="w.2.part.2" type="010"]Rechtspraak[/w] [w pos="SPEC(deeleigen)" xml:id="w.2.part.3" type="010"]Overheidsbeschikkingen[/w] [/name] [w pos="VZ(init)" lemma="op" type="600" xml:id="w.3"]op[/w] [w pos="TW(hoofd,prenom,stan)" lemma="1" type="470" xml:id="w.4"]1[/w] [name key="juli" resp="namenKlus"] [w pos="N(eigen,ev,stan)" xml:id="w.5.part.0" type="010"]juli[/w] [/name] [w pos="ADJ(prenom,basis,met-e)" lemma="a.s." type="103" xml:id="w.6"]a.s.[/w] [w pos="VZ(init)" lemma="in" type="600" xml:id="w.7"]in[/w] [w pos="N(soort,ev,e-nom,stan,x-basis)" lemma="werking" type="000" xml:id="w.8"]werking[/w] [w pos="WW(pv,e-hulp-of-koppel,tgw,3,ev)" lemma="zullen" type="273" xml:id="w.9"]zal[/w] [w pos="WW(inf,e-intrans,vrij)" lemma="treden" type="200" xml:id="w.10"]treden[/w] [pc xml:id="w.11" pos="LET()"],[/pc] [w pos="WW(pv,e-hulp-of-koppel,tgw,3,ev)" lemma="kunnen" type="273" xml:id="w.12"]kan[/w] [w pos="VNW(aanw,det,stan,prenom)" lemma="dit" type="370" xml:id="w.13"]dit[/w] [w pos="N(soort,ev,e-nom,stan,x-basis)" lemma="artikel" type="000" xml:id="w.14"]artikel[/w] [w pos="WW(inf,e-intrans,vrij)" lemma="vervallen" type="200" xml:id="w.15"]vervallen[/w] [pc xml:id="w.16" pos="LET()"].[/pc] [/s]