acdh-oeaw / staribacher-static

https://acdh-oeaw.github.io/staribacher-static/
Other
0 stars 0 forks source link

setup typesense #3

Closed cfhaak closed 7 months ago

cfhaak commented 8 months ago

As soon as #7 is done you could basically start creating the search index. But maybe provide some help with other tasks first … You could e.g. split the development of the search page between front (instant search) and backend (typesense index building)…

cfhaak commented 8 months ago

Anyway have a look at the roadmap first …

fsanzl commented 8 months ago
cfhaak commented 8 months ago

for creating the fulltext use peters pyutil function https://github.com/acdh-oeaw/acdh-tei-pyutils/blob/517c0c8f7f6cd934fe7ec873da660ec9c763b47b/acdh_tei_pyutils/utils.py#L26

fsanzl commented 8 months ago

We need a xml:id for each document. Addressed in https://github.com/fun-with-editions/staribacher-data/issues/5

cfhaak commented 8 months ago

Nice! The search looks good so far!

fsanzl commented 8 months ago
fsanzl commented 8 months ago

According to TS manuals, it is advisable to represent dates as Unix timestamps, so they can be directly tested with numeric operators <, >, =, !=. Some source files contain texts covering a range of dates while others have a single day. I have added two additional int32 objects to the TS schema to represent the date attributes @from and @to, which become the same value if the document provides just a @when.

cfhaak commented 7 months ago

Cool! That should work!

fsanzl commented 7 months ago
{
    'project': 'STB',
    'id': 'staribacher__19781220',
    'resolver': '/staribacher__19781220.html',
    'rec_id': 'staribacher__19781220.xml',
    'title': 'Mittwoch, der 20. Dezember 1978',
    'year': '1978-12-20',
    'notbefore': 282956400,
    'notafter': 282956400,
    'persons': [
        'Bacher, Gerd',
        'Benya, Anton',
        'Broda, Christian',
        'Burian, Ferdinand',
        'Frank, Wilhelm',
        'Frauscher, Reinhard',
        'Fremuth, Walter',
        'Gratz, Leopold',
        'Heindl, Kurt',
        'Jagoda, Karl',
        'Kienzl, Heinz',
        'Kreisky, Bruno',
        'Lachs, Thomas',
        'Marsch, Gerhard',
        'Riedl, B',
        'Schwarz, Walter',
        'Tischler, Margarete',
        'Wanke, Otto',
        'Wirlandner, Stefan',
        'Zilk, Helmut'],
    'anchor_link': 'p__35',
    'full_text': 'Der Energieplan wird spätestens Anfang des nächsten Jahres vorliegen. Nach Meinung Franks sollte er unbedingt als ein Bericht an den Nationalrat gesendet werden. Das Rohstoffkonzept wird durch die Wirtschaftsforschungsunterlagen schön langsam gestaltet.'}
fsanzl commented 7 months ago

xml:id becomes a random-looking html id through <p id="{local:makeId(.)}" class="yes-index> when transformed by editions.xsl instead of being kept as-is.

csae8092 commented 7 months ago

xml:id becomes a random-looking html id through <p id="{local:makeId(.)}" class="yes-index> when transformed by editions.xsl instead of being kept as-is.

you can remove this xsl directive, or even better rewrite it so that it only applies on tei:p elements without @xml:ids

fsanzl commented 7 months ago

Sorry, I forgot the indication above. I already sorted the <p id> out yesterday

fsanzl commented 7 months ago
fsanzl commented 7 months ago

It seems ok to me. The font color of the refinements would be better addressed in the CSS.