Open Kh-Oleg opened 1 year ago
That's strange indeed.
Would it be possible for you to share the full script?
Meanwhile, a few things to try, would be glad to see the effect on your issue:
WEB_CONCURRENCY
to 1
TELEMETRY_ENABLED
to False
TELEMETRY_ENABLED
was set to 0 before. I tried also with WEB_CONCURRENCY=1
- nothing has changed.
My real script does more, but I could reproduce the bug with a smaller one:
import requests
import json
import time
fief_url = 'http://localhost:8001/admin/api'
# Assuming Admin API Key exists in Fief's database
API_KEY='*************************************'
def post(route, data):
# time.sleep(1) # Required, since Fief 0.24.2 has a race condition somewhere
headers = {"Content-Type": "application/json",
"accept": "application/json",
"Authorization": f"Bearer {API_KEY}"}
response = requests.post(f"{fief_url}/{route}", headers=headers, data=json.dumps(data), timeout=0.5)
if response.status_code >= 400:
print(f"Error! data: {json.dumps(data)}")
raise Exception(response.content)
return response.json()
def create_permission(data):
response = post('permissions/', data)
return response['id']
# Create permissions
for i in range(1, 10):
permission_id = create_permission({'name': f"Create Castle{i}", 'codename': f"createCastle{i}"})
print(f"permission created, id: {permission_id}")
Fief log output:
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
Main Fief workspace already exists
Main Fief user already exists
2023-05-02 19:46:15.955 | INFO | fief.lifespan:lifespan:31 - Fief Server started - {"version": "0.24.2"}
xxx@xxx:/opt/src# INFO: 127.0.0.1:40426 - "POST /permissions/ HTTP/1.1" 201 Created
INFO: 127.0.0.1:40436 - "POST /permissions/ HTTP/1.1" 201 Created
INFO: 127.0.0.1:40450 - "POST /permissions/ HTTP/1.1" 201 Created
INFO: 127.0.0.1:40462 - "POST /permissions/ HTTP/1.1" 201 Created
INFO: 127.0.0.1:40476 - "POST /permissions/ HTTP/1.1" 500 Internal Server Error
2023-05-02 19:46:43.756 | INFO | fief.tasks.base:__call__:117 - Start task - {"task": "trigger_webhooks"}
2023-05-02 19:46:43.825 | INFO | fief.tasks.base:__call__:119 - Done task - {"task": "trigger_webhooks"}
2023-05-02 19:46:43.831 | INFO | fief.tasks.base:__call__:117 - Start task - {"task": "trigger_webhooks"}
2023-05-02 19:46:43.844 | INFO | fief.tasks.base:__call__:119 - Done task - {"task": "trigger_webhooks"}
2023-05-02 19:46:43.878 | INFO | fief.tasks.base:__call__:117 - Start task - {"task": "trigger_webhooks"}
2023-05-02 19:46:43.903 | INFO | fief.tasks.base:__call__:119 - Done task - {"task": "trigger_webhooks"}
2023-05-02 19:46:43.909 | INFO | fief.tasks.base:__call__:117 - Start task - {"task": "trigger_webhooks"}
2023-05-02 19:46:43.937 | INFO | fief.tasks.base:__call__:119 - Done task - {"task": "trigger_webhooks"}
I see, that each time, when the problem doesn't occur, both 'Start task' and 'Done task' are executed before the next request arrives. The opposite is also true: when the 500 occurs, web hooks appear in the log after one or more following requests. This makes me thinking that there is, probably, some kind of deferred web hook trigger, which doesn't handle properly the case, when many requests arrive one after another.
500 most likely means some exception inside Fief, but log output doesn't show it. I would capture the Exception stack trace as well, if you'll tell me how to enable it.
I was able to reproduce it (especially when creating users, not when creating permissions). Here is the error I got:
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked
It seems there is a lock on the SQLite database, probably because several processes try to access it concurrently. I'll try to see what can be done about it.
I was able to track down the problem: it comes from the AuditLogger
sink, which makes requests to the DB in a background thread.
My intuition is that, sometimes, when a subsequent request comes to create a user/permission, the AuditLog from the previous request is currently being written to the DB, which raises the database lock error.
I'll see how to mitigate this. Maybe by deferring the AuditLogger completely in the background worker.
The bug still exists, but is only less frequent now.
I'd suggest reopening the bug and investigating it further, GitHub doesn't allow me to reopen it.
I am getting a similar error I think just trying to login to admin now -- I was recently able to login though, still trying to debug it at the moment.
Describe the bug
Fief responds with "500 Internal Server Error" while using different Admin REST API from a script.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
No "500 Internal server error" responses from Fief.
Configuration
self-hosted, Fief version: 0.24.2
Additional context
The error is non-deterministic, it's not related to any particular request, any request can return 500. When I added 1 second delay between requests, everything went OK. You can see the successful log below.
Without delays, however, I could not execute my script fully, not getting 500 at some of the requests. The log in this case looks like this:
It looks like, there is a race related to webhooks somewhere.