Closed BalduinLandolt closed 2 years ago
@kraus-s I've been giving this some thought, and I think we're facing a fundamental issue here (as well as in other places), namely that we have complex data which we want to be able to reproduce; but at the same time, pandas dataframes might not be ideal to query this kind of data. (That's why I introduced the ms-person-matrix, to support efficient querying.)
This leaves us with all sorts of performance issues, and essentially made us re-invent the database, as you put it at some point.
So I'm wondering if persons might be a good starting point to experiment with python's SQLite module. I hope this would speed up loading times as well as querying the data. And it might be possible to set up extracting the data from the XMLs in a more memory-efficient manner (thinking of #52 ).
What do you think? worth a try? Maybe a project for a pair-programming session?
Agreed. Dataframes only make it more complicated for us at this point.
@kraus-s should we consider this done? Or should we keep it open to force ourselves to refine it even further?
I mean, we have basically done exactly this and I believe it works reasonably well. If we see any other issues coming up or things we want to improve, I think we can open a new issue for that.
currently, person handling is rather crude:
more should be possible with persons