arbeitsgruppe-digitale-altnordistik / Sammlung-Toole

A new look on Handrit.is data
https://arbeitsgruppe-digitale-altnordistik.github.io/Sammlung-Toole/
MIT License
0 stars 0 forks source link

improve person handling #64

Closed BalduinLandolt closed 2 years ago

BalduinLandolt commented 3 years ago

currently, person handling is rather crude:

more should be possible with persons

BalduinLandolt commented 2 years ago

@kraus-s I've been giving this some thought, and I think we're facing a fundamental issue here (as well as in other places), namely that we have complex data which we want to be able to reproduce; but at the same time, pandas dataframes might not be ideal to query this kind of data. (That's why I introduced the ms-person-matrix, to support efficient querying.)
This leaves us with all sorts of performance issues, and essentially made us re-invent the database, as you put it at some point.

So I'm wondering if persons might be a good starting point to experiment with python's SQLite module. I hope this would speed up loading times as well as querying the data. And it might be possible to set up extracting the data from the XMLs in a more memory-efficient manner (thinking of #52 ).

What do you think? worth a try? Maybe a project for a pair-programming session?

kraus-s commented 2 years ago

Agreed. Dataframes only make it more complicated for us at this point.

BalduinLandolt commented 2 years ago

@kraus-s should we consider this done? Or should we keep it open to force ourselves to refine it even further?

kraus-s commented 2 years ago

I mean, we have basically done exactly this and I believe it works reasonably well. If we see any other issues coming up or things we want to improve, I think we can open a new issue for that.