Closed acha21 closed 2 years ago
Hi @acha21, thanks for opening this issue and providing so much context. I'm going to take a try at digging through the logs and debugging it tomorrow, though I want to note you may have better luck isolating the problem on your side using the mephisto metrics
tooling in the meantime.
Jumping into the logs, I recently pushed a fix into main
that should resolve the bugs related to handle_updated_agent_status
. The ones in the log around sqlite3.IntegrityError: UNIQUE constraint failed: workers.worker_name
are much stranger.
The following section should only try to create an entry in the database if it doesn't already exist: https://github.com/facebookresearch/Mephisto/blob/main/mephisto/operations/worker_pool.py#L152-L165
I'm not sure I know how to reproduce this second part given the above.
That being said, we're observing a strange slowdown issue in Mephisto on the current main
branch (also reported by @Alex-Gurung) where the collection rate decreases towards zero as more data is collected. While we expect some decrease of this sort, it should never actually reach zero. As such we'll be investigating this next week.
Thank you for your quick response.
Since I am now pressed for the time, I need a quick workaround for this issue. Do you think that the error Is it the log sqlite3.IntegrityError: UNIQUE constraint failed: workers.worker_name
is related to the issue where the collection rate decreases towards zero as more data is collected? If not, I am thinking about splitting the whole RunTask into small independent pieces as a workaround, but the method cannot limit the number of maximum_units_per_worker
.
Is there any suggestion for me?
Launching on multiple runs won't help out (and would likely be worse for the worker name issue). If you're on a tight deadline I'd suggest relaunching periodically - relaunching every few hours.
Hi @acha21, this should now be fixed in #770. I'll be moving to merge it later this week, but feel free to try the branch out sooner. Let me know if your issues are resolved afterwards!
Closing as fixed in our most recent release (1.0.3
)
Hello, I am Yeonchan Ahn in South Korea. Thank you for sharing a great framework to use Mturk.
The day before yesterday(4/26), I have conducted a static react-based survey using Mephisto v.1.0.1 on Mturk with Heroku hobby. During about 1~2 hours I confirmed that the data is being collected, but after some time it didn't work properly. In order to check out whether the problem repeats or not, yesterday(4/27) I re-runed the same script with a small number (216) of units and got the same phenomenon (after I collected 171 results and failed to collect the results of the rest). The following config is what I used yesterday.
I am frustrated that I can't even figure out which part causes the problem. So I upload the whole log that I failed yesterday. scripts.log
Here is another piece of information that may useful. Currently, I am doing a survey for evaluating my AI systems using Mephisto v. 1.0.1 (a216d2d6ba739aadde2cacaa906dad5e78d6dc2f). When I implement the UI for the survey from the example copied from examples/static_react_task. In the UI, I have implemented an Onboarding example but used it for just a demo which means all of the workers who submitted any answer can participate in our main survey.