annotation / mondriaan

Proeftuin: a dozen or so of Mondriaan letters in a dataset of text + annotations
MIT License
1 stars 0 forks source link

Search functionality #18

Open Beatrice-nava opened 1 year ago

Beatrice-nava commented 1 year ago

1) Full text search (currently in the collection of letters, but we will add the writings)

2) Facet search

Search results should be sortable by

pboot commented 1 year ago

Something to take into account from the start is perhaps that the search facility will also be used for documents other than letters: the writings of course but also the introductions, biography, etc. Most of the above facets will not be applicable for these documents, but there should be a sort of super-facet: which type of document are we searching in or for.

dirkroorda commented 1 year ago

Let's talk a little about this, e.g. the manuscript identifier.

In one of the letters we see:

<sourceDesc>
    <msDesc>
        <msIdentifier>
            <country>Nederland</country>
            <settlement>Otterlo</settlement>
            <institution>Kröller Müller Museum</institution>
            <idno>KM 123.397</idno>
            <altIdentifier><idno type="letterId">19090216y_IONG_1303</idno></altIdentifier>
            <altIdentifier><idno type="def"/></altIdentifier>
        </msIdentifier>
        <physDesc>
            <objectDesc form="correspondentiekaart"/>
            <decoDesc>
                <decoNote/>
            </decoDesc>
        </physDesc>
    </msDesc>
</sourceDesc>

Does that mean that every result in the divs in this letter should showup in the facets

And do we ignore the objectDesc?

pboot commented 1 year ago

Yes and yes. (1) If they don't appear in the facets, the user can's use the facets to make further selections. (2) To Beatrice and me this didn't seem urgent. But some of this may be fine-tuned on the basis of feedback from Wietse and Leo once they see what it looks like.

dirkroorda commented 1 year ago

@pboot Shall we also pick up the sender of the letter? I assume that there are also letters in the corpus that have been sent to Mondriaan?

And even if that is not the case, if somebody later combines this dataset with other letters sent to Mondriaan, then it is nice to have the sender metadata in place. Anyway, the info is there, and it is easy to pick it up.

dirkroorda commented 1 year ago

I can also pick up the date in short form (from the when attribute) and in long form, from the element content. Done it already.

dirkroorda commented 1 year ago

By the way, the <rs> elements refer by attribute key or ref. For persons I see ref, for artworks I see key. @Beatrice-nava Is there an intentional difference?

Beatrice-nava commented 1 year ago

Yes, because at some point we decided to use both attributes within the <rs> element:

But things will probably change, as Peter mentioned in our last meeting, as we are discussing the possibility of using the RKD database to fill our xml files. This implies that we will create an artwork.xml similar to bio.xml, etc., and will probably refer only to our database, using @ref. We will keep you updated on this!

pboot commented 1 year ago

Shall we also pick up the sender of the letter?

Yes, that'll be useful too.

dirkroorda commented 1 year ago

All the features mentioned under 2) are in Text-Fabric now, and put through to the WATM annotations. However, as far as the rs elements are concerned, the feature values only contain the contents of the ref or key attributes.

Later we will use that to follow those references to pull data out related files, such as artwork.xml, bio.xml, biblio.xml.

dirkroorda commented 1 year ago

@Beatrice-nava Question came up: shouldn't we add this to the metadata of the letter as well:

teiHeader > fileDesc > sourceDesc > msDesc > physDesc > objectDesc het attribuut form=correspondentiekaart

Beatrice-nava commented 1 year ago

I think we decided to ignore it for the time being, as we basically only have letter and postcard as objectDesc, but that could change based on Wietse and Leo's feedback (I think we have already mentioned this in a previous comment, but for the moment there is no news).