Create endpoint for user data

digitalfabrik / entitlementcard

App for 'Digitale Berechtigungskarten', generally benefit card for volunteers or socially vulnerable groups in Germany. App for Android & iOS + Backend + Administration Web Portal – 100% Open Source.

MIT License

35 stars 3 forks source link

Create endpoint for user data #1416

Open f1sh1918 opened 2 months ago

f1sh1918 commented 2 months ago

Is your feature request related to a problem? Please describe. Create an http put endpoint that receives user data. Describe the solution you'd like

create new table: userHash, startDate, endDate, invalid, lastUpdated (timestamp)
create a endpoint and function that receives data with a csv body
check data validity: hash length, valid date format

seluianova commented 1 week ago

Input data example:

userHash, startDate, endDate, valid
dashj21sasd32, 12.05.2024, 12.10.2028, true

seluianova commented 1 week ago

Questions:

If userHash already exists, we update the entry, if not, we create an entry? Answer: yes.
If userHash exists in the database, but not in the csv, do we remove an entry? Answer: no. We will clean such entries in the database later, probably using some kind of scheduled job.
Do we want to have 'koblenz' in the name of the table? Like, koblenzusers ? Answer: no, we keep the generic names. But then the table must also contain the project_id column.
How we can check userHash validity? Hash example: $argon2id$v=19$m=16,t=2,p=1$MTIzNDU2Nzg5QUJD$KStr3PVblyAh2bIleugv796G+p4pvRNiAON0MHVufVY What we could check:

$argon2id: means which algorithm is used
v=19: means which version is used
m=16 means which memory costs are used
t = 2 means which number of iterations are used
p = 1 means which numer of paralellis is used

seluianova commented 3 days ago

Might be nice to know how much data we expect in the CSV. The processing time for 1000 lines takes about 2.82 seconds on my machine locally (before warming up). After warming up it's about 1.35 sec.

UPD: We have a requirement about the supported data volume from our Leistungsbeschreibung for Koblenz:

Das Backend ermöglicht insbesondere den einmaligen Import von ca. 15.000 - 20.000 Datensätzen bzw. deren Hash-Werten. Das Backend muss auch den fortlaufenden Import (wöchentlich) von 15.000 - 20.000 Datensätzen ermögli- chen.

Re-tested for 20000 entries: 29.13 sec before warming up 17.78 sec after warming up

We might want to think about the performance optimization then?

michael-markl commented 2 days ago

Regarding hash validity checks: We could also require that the first entry is a dummy entry with a hash deduced from dummy data. I think for other hashes we can only check their length. (as I've written in #1499 I think we should not add these parameters to the hashes).

michael-markl commented 2 days ago

Re-tested for 20000 entries: 29.13 sec before warming up 17.78 sec after warming up

We might want to think about the performance optimization then?

I don't think 30s would be a problem. It should only run once a week and we may require that it only runs at night for example.