arbeitsgruppe-digitale-altnordistik / Sammlung-Toole

A new look on Handrit.is data
https://arbeitsgruppe-digitale-altnordistik.github.io/Sammlung-Toole/
MIT License
0 stars 0 forks source link

sub-corpora as part of data handler #65

Closed BalduinLandolt closed 2 years ago

BalduinLandolt commented 3 years ago

groupings, as retrieved by searches, should be stored in subcorpora.

all set-theory operations (sum, intersection, ...) should be possible on subcorpora.

metadata should be retrievable on basis of subcorpora

kraus-s commented 2 years ago

I can see this quickly getting out of hand if we store every search operation as a subcorpus in the data handler. How about a corpus building pipeline that would allow for the results of different search operations to be combined, i.e. like a little plus button at the bottom, which would add the results to a corpus in the session state. Could then be saved as file/pickle to be loaded back into the handler later, so it's not permanently in memory. If that sounds something like what you had in mind, I'll get started on it.

BalduinLandolt commented 2 years ago

I'm not sure we really want to make the subcorpora persistent, at least at first... And if they only last for the runtime, we don't need to worry about things getting out of hand right away. This would allow for implementing a nice prototype that we then can discuss with team meckern/product owners. ;)
(also, I dislike the term "subcorpus" more and more. maybe we could come up with something better... maybe "group" or so? Do you have better ideas?)


The way I envisioned it was roughly as follows:

Does this make sense to you?


in terms of architecture, I think it should be a class, that holds

the thing I'm least sure about is, what entities we allow in groups? Only manuscripts? Or also persons and texts?


Long story short: let me know if you plan on working on that, or if I should! Would be nice to get this done as soon as possible...