Closed HiromuHota closed 4 years ago
As can be seen from the slightly different error message (ie "This session is in 'committed' state" and "This session is in 'prepared' state"), this issue looks to me be in a race-condition.
This https://github.com/HiromuHota/fonduer-tutorials/runs/879864413 successfully demonstrated this issue.
I think the root cause is that a single session
is accessed concurrently by multiple threads in the UDFRunner
.
The main thread at https://github.com/HazyResearch/fonduer/blob/a3af3877ef94fd8466b24b7cad7145a13413ac68/src/fonduer/utils/udf.py#L149
and the another at https://github.com/HazyResearch/fonduer/blob/a3af3877ef94fd8466b24b7cad7145a13413ac68/src/fonduer/utils/udf.py#L136
Actually, the main thread's access to the session happens only when Labeler
and Featurizer
like below:
https://github.com/HazyResearch/fonduer/blob/a3af3877ef94fd8466b24b7cad7145a13413ac68/src/fonduer/supervision/labeler.py#L309-L311
Description of the bug
When executing
labeler.apply
orfeaturizer.apply
, sqlalchemy.exc.InvalidRequestError occurs.To Reproduce
Steps to reproduce the behavior:
Expected behavior
labeler.apply
andfeaturizer.apply
run without an error.Error Logs/Screenshots
The last part of the error message could be
Environment (please complete the following information)
Additional context
I think this is a regression caused by #439 as the error happens during
self.last_docs.add(doc.name)
, which was added by #439.