Closed nonprofittechy closed 3 years ago
Plocket tried a test reproduction, using the test_alexhibit.yml interview. She uploaded this file:
and I confirmed that the contents match inside the docker container: it was /tmp/files/101283/file.jpg
and the same path on S3 also matched.
However, when the file was added to an ALDocumentBundle, this uploaded file doesn't appear in any of the bundles.
Restarting the server by editing and saving the config does not change the output at all.
Potentially relevant: https://blog.miguelgrinberg.com/post/how-secure-is-the-flask-user-session
Notes from Coding meeting:
Potential causes?
Looked at a couple places in the DA source code.
docassemble=# select * from uploads ou
docassemble-# where (select count(*) from uploads inr
docassemble(# where inr.indexno = ou.indexno) > 1
docassemble-# ;
indexno | key | filename | yamlfile | private | persistent
---------+-----+----------+----------+---------+------------
(0 rows)
We tried to force this to recur by running 400 interviews in a row and did not get any form leakage. Also investigated the server uptime on affected servers and all the servers restarted about 4 days ago. It's possible it was due to an unsafe shutdown caused by an automated server security patch. Going to keep this open a couple more days and then close. Hopefully we can get some fixes upstream, if it really was a database issue, to ensure that unsafe shutdowns don't leak data.
So to check, the leak would be because we've created files up to some number, say 1000. Then we save, and create some more, up to 1500. Then the server shutsdown unexpectedly, and the postgres db isn't full saved. So, S3 has existing files up to 1500, but postgres only thinks that there's 1000, so the next 500 files will conflict with existing files.
Is there any process we (or DA core) can run after a potential shutdown / upgrade to catch that discrepancy? Should be able to get the current val of the index with currval
That is my guess about how this explanation would work, yes. I think there are a number of mitigations that DA could take to avoid it presenting exactly this way--but to be fair, a database rollback is always going to have negative consequences.
Truthfully Postgres can handle an unsafe shutdown 99% of the time with no issue. So I wonder if it's really necessary to load the backup from 6 AM every time you restart the server. Maybe it should try just starting up and checking for corruption, at least optionally.
We haven't seen this problem come back--I think we can safely assume that the database changes that Jonathan made will resolve this in the future. We still need to schedule a new docker run, and should do our best to set up an external database for our production server when it's possible.
I have had a few reports and now witnessed myself a situation where the as_pdf() method displays a wrong, cached file from the server. Most recently, I ran the CDC Moratorium interview and on the preview screen it linked the PDF for the civil action cover sheet. On the final download screen, it linked the PDF for the Civil Docketing Statement. Both are interviews I ran yesterday evening.
I assume our aggressive caching of the ALDocument is to blame, although it could also be a Docassemble bug. We should try to reproduce and track this problem down. Also need to figure out if this could possibly leak private data.
For now, restarting nginx seems to fix it (by saving the config file, e.g.).
Edit: note that the download_list_html() results appeared to be correct--only the file displayed with as_pdf() was wrong.