Open f1sh1918 opened 2 months ago
Input data example:
userHash, startDate, endDate, valid
dashj21sasd32, 12.05.2024, 12.10.2028, true
Questions:
If userHash already exists, we update the entry, if not, we create an entry? Answer: yes.
If userHash exists in the database, but not in the csv, do we remove an entry? Answer: no. We will clean such entries in the database later, probably using some kind of scheduled job.
Do we want to have 'koblenz' in the name of the table? Like, koblenzusers
?
Answer: no, we keep the generic names. But then the table must also contain the project_id column.
How we can check userHash validity?
Hash example: $argon2id$v=19$m=16,t=2,p=1$MTIzNDU2Nzg5QUJD$KStr3PVblyAh2bIleugv796G+p4pvRNiAON0MHVufVY
What we could check:
Might be nice to know how much data we expect in the CSV. The processing time for 1000 lines takes about 2.82 seconds on my machine locally (before warming up). After warming up it's about 1.35 sec.
UPD: We have a requirement about the supported data volume from our Leistungsbeschreibung for Koblenz:
Das Backend ermöglicht insbesondere den einmaligen Import von ca. 15.000 - 20.000 Datensätzen bzw. deren Hash-Werten. Das Backend muss auch den fortlaufenden Import (wöchentlich) von 15.000 - 20.000 Datensätzen ermögli- chen.
Re-tested for 20000 entries: 29.13 sec before warming up 17.78 sec after warming up
We might want to think about the performance optimization then?
Regarding hash validity checks: We could also require that the first entry is a dummy entry with a hash deduced from dummy data. I think for other hashes we can only check their length. (as I've written in #1499 I think we should not add these parameters to the hashes).
Re-tested for 20000 entries: 29.13 sec before warming up 17.78 sec after warming up
We might want to think about the performance optimization then?
I don't think 30s would be a problem. It should only run once a week and we may require that it only runs at night for example.
Is your feature request related to a problem? Please describe. Create an http put endpoint that receives user data. Describe the solution you'd like