Open lognaturel opened 4 months ago
If the upload succeeded, then I don't think that Backend itself would have returned an error response. Otherwise the database transaction should have been rolled back. I think that nginx is returning a 504 after a set amount of time without waiting for Backend to finish.
After a minute, I saw a 504
Could it have been 2 minutes? That's how nginx is configured. Was the error message "Something went wrong: error code 504." ? If so, that's another sign that it's nginx. An error from Backend would be more specific.
If it's nginx, here are a couple of ideas for how to address it:
I'm not sure that closing the modal would necessarily help, because the user could just reopen the modal or even refresh the page in order to try again. Backend would still be working on the original request. getodk/central#785 is another example of how Backend can be working on concurrent requests even after a 504 response.
"Something went wrong: error code 504."
Yes, exactly. I’m quite sure it’s nginx, there was nothing in the service log.
Trickling a response like the backup endpoint would make a lot of sense.
I’m less sure of the implications of modifying the timeout.
I’m less sure of the implications of modifying the timeout.
I feel like we've considered this idea before, though I don't remember why we didn't make this change. How long does it take Postgres to time out? Would it be reasonable to change the nginx timeout to match the Postgres timeout, at least for non-GET requests?
Trickling a response like the backup endpoint would make a lot of sense.
With the backup endpoint, we trickle random data that winds up in the backup .zip file. I don't think we'd want to return random data or a .zip file from the upload endpoint. But I think the current response from the upload endpoint is {"success":true}
, and we could trickle that out, returning a character every minute or so. Maybe that would be the easiest change to make.
For some reason, I'm a little surprised that an upload request would take as long as you're seeing. Maybe we knew that already and I'm just forgetting. 😅 Once we allow the request to take more than 2 minutes, I'm wondering whether we should do more to signal to the user that the request really is in progress and that they definitely shouldn't refresh the page and try again. For example, after 30 seconds or a minute, we could change "Processing file..." to "Still processing...", or we could show an alert that mentions "don't refresh".
Problem description
I uploaded https://drive.google.com/file/d/1y2Z9ZwHcX60FRW5F2vxbolooj3F6-bgY/view?usp=drive_link which has 100k Entities. After a minute, I saw a 504 at the top of the modal with the append button. I exited the modal and saw my Entities were successfully created.
URL of the page
https://staging.getodk.cloud/#/projects/93/entity-lists/entities_100k/entities
Expected behavior
~In the case of a 504, I think we should close the modal if possible. Because there's no duplicate detection, a user is at high risk of uploading the same Entities twice and then they're stuck with them.~
Alternately, could the server send back something to say it's still working on it?
Central version shown in version.txt
Browser version
Around when did you see the problem (in UTC)?
Other notes (if any)