MuckRock / muckrock

MuckRock's source code - Please report bugs, issues and feature requests to info@muckrock.com
https://www.muckrock.com
GNU Affero General Public License v3.0
114 stars 22 forks source link

Set max size for communications #249

Closed blipblitz closed 9 years ago

blipblitz commented 9 years ago

This is related to #198.

https://www.muckrock.com/foi/united-states-of-america-10/fy2013-foia-log-bureau-of-prisons-8376/ This request is one of those that seems to pull all of the quoted text sometimes and then not all of the appropriate text other times. It throws app errors when I try to delete the extra text from past communications.

mitchelljkotler commented 9 years ago

Due to email being shitty, there's no perfect way to pull out the quoted text. Mailgun does a really good job a very large percentage of the time. I think the best we can do is some sort of monitor on communications getting too large, and pruning them down before they get out of control. We can also limit the size of these fields, but its hard to come up with a hard constraint on the longest message.

morisy commented 9 years ago

Let's set a max incoming size of saved communications at 5mb (obviously this should apply to attachments, just the message body), and cut it off if it's longer than that. Once the tasks system is up and running, we can create a task each time something hits that limit for a second look.

If it's simpler to do it by character count, let's save 5,101,000 characters which should be enough to cover. We'll see how that works, and adjust up or down as needed.

mitchelljkotler commented 9 years ago

@morisy 5 million still seems very large and will cause pages to crash. Remember this is per communication, a single request may have dozens of requests all at the max size. I'm thinking we may want something more along the lines of 100k... is there any legitimate reason to have a single communication be more than 100,00 characters?

mitchelljkotler commented 9 years ago

https://www.muckrock.com/admin/foia/foiarequest/1646/ is a legit request with a communication of length 141k, but is a somewhat hackey use case

morisy commented 9 years ago

MR1646 was a one-off experiment done years ago, and I wouldn't have a problem with it getting cut off if it means generally stronger site-wide stability.