UniStuttgart-VISUS / damast

Code for the DH project "Dhimmis & Muslims – Analysing Multireligious Spaces in the Medieval Muslim World" (VolkswagenFoundation)
MIT License
10 stars 1 forks source link

Duration of storage of reports in respective db #61

Closed tutebatti closed 2 years ago

tutebatti commented 2 years ago

I think we only discussed this via mail. At least I couldn't find an issue in the old, non-public repository.

As far as I know, currently, reports are stored "forever". What we thought about is deleting the html and pdf of a report in the database, if the report has not been accessed for the last 3 months. After they were deleted, both html and pdf would need to be generated anew based on the still existing json. Is this correct?

Once we discussed this, I would change the tag of this issue to enhancement.

mfranke93 commented 2 years ago

Yes, good point. I am not 100% sure about the timeframe (3 months seems much), but that is something that could be configurable. From my side, there is no need for further discussion :)

mfranke93 commented 2 years ago

I would even go so far to suggest: If the report database exceeds a (configurable) file size, even report files newer/more popular than that are "optimized" based on last access date.

So, for example: If the report database gets larger than, say, 5GB, so many HTML and PDF report results are removed that the database content is below (for example) 4GB afterwards (hysteresis). That would make the concept even more resilient towards heavy use.

rpbarczok commented 2 years ago

@tutebatti: As far as I remember that was the result of our discussion. What I am not sure of is whether Dorothea was part of the discussion. @mfranke93: I would like to give the user an absolute timeframe for the storage of the reports (e.g. 3 months), so that they know how long their data will be stored. But I see your point of the heavy use.

mfranke93 commented 2 years ago

I would like to give the user an absolute timeframe for the storage of the reports (e.g. 3 months), so that they know how long their data will be stored. But I see your point of the heavy use.

It's not as if "their data" is gone afterwards. It just takes a few minutes, instead of a fraction of a second, to retrieve them again. Also, if this behavior is communicated to users at the time of report generation, we have to be very explicit with phrasing, because "stored for 3 months" is easy to understand, but "stored for 3 months after last access" is harder to bring across.

tutebatti commented 2 years ago

If the report database exceeds a (configurable) file size, even report files newer/more popular than that are "optimized" based on last access date.

This does not mean, as far as I understand, that the report will be deleted entirely. Here, too, html and pdf will be reproducable later. Right? So the problem you raised, @rpbarczok, should be solved.

What I am not sure of is whether Dorothea was part of the discussion.

I have talked with her, as far as I remember. At any rate, is it feasible that the db will exceed 5GB as long as it contains json files only, @mfranke93?

mfranke93 commented 2 years ago

This does not mean, as far as I understand, that the report will be deleted entirely. Here, too, html and pdf will be reproducable later. Right? So the problem you raised, @rpbarczok, should be solved.

See above.

I have talked with her, as far as I remember. At any rate, is it feasible that the db will exceed 5GB as long as it contains json files only, @mfranke93?

I would say no. Typical report filters are ~300B, and if they contain some GeoJSON (map restrictions), they might go into the ~10kB range. At that range, it would take over ten years to reach that size even if 100 users generated a report each day. And the 5GB is an arbitrary number anyways :)

mfranke93 commented 2 years ago

For reference, this is what the lifecycle of reports would look like under the new system:

out

tutebatti commented 2 years ago

Can we change this to be an enhancement now? Any objections, @rpbarczok?

rpbarczok commented 2 years ago

No

tutebatti commented 2 years ago

@mfranke93 So the change should be implemented as discussed: