UUDigitalHumanitieslab / I-analyzer

The great textmining tool that obviates all others
https://ianalyzer.hum.uu.nl
MIT License
6 stars 2 forks source link

Large CSV downloads #1474

Open JeltevanBoheemen opened 4 months ago

JeltevanBoheemen commented 4 months ago

Is your feature request related to a problem? Please describe. Depending on user download limit, very large CSV files can be generated . Creation of these files is chunked, so this poses no problem for the server. However, downloading the files is limited to the time offered by Apache's proxy timeout (by default: 30 seconds). There is no timeout value that would fix this problem, since it is highly dependent on internet connection. For downloads into several hundred MB's, this timeout would be exceedingly large.

Describe the solution you'd like Possible solutions with notes:

jgonggrijp commented 4 months ago

I think that in a distant past, a similar question has come up about how much users should be able to download (in the most extreme case). If we can find notes from that past, that might shed some additional light on the discussion (especially on the option "don't serve large files").

"Several hundred MBs" could mean an entire corpus in some cases. It might also depend on the corpus whether we want to facilitate that, since publishers generally are rather protective about who gets to download their corpora.

lukavdplas commented 4 months ago

Re. licences and publishers: see also #1437. I-analyzer is already designed with the idea that access needs to be managed per corpus; I think that should extend to download rights. However, we also serve data where that isn't an issue.