Open zzr93 opened 1 year ago
404 means the provided token doesn't exist,
could you please connect db by PGPASSWORD=${PG_PASSWORD} psql -h ${DB_IP} -p 5432 -U root mars_db.
DB_IP is the hai-platform container/service ip or your customized db ip if configured,
PG_PASSWORD is "root" by dfault,
check the output of select * from "users"
.
I suppose the token doesn't exist in the table, in that case, please hai-cli init
with the correct token in db again.
I couldn't found table "users", but I found "user" and "user_access_token" which may be related to this situation. I tried both tokens as below, and hai-cli accepts only access_token(which I used last week). So It seems I have already init with the correct token. Any other possible reasons?
mars_db=# select * from "user";
user_id | user_name | nick_name | token | role | active | last_activity | shared_group
---------+-----------+-----------+----------------------+----------+--------+----------------------------+--------------
10020 | haiadmin | haiadmin | haiadmin | internal | t | 2023-07-13 19:06:12.531952 | hfai
10000 | bff_admin | bff_admin | a69a81ca18b2712fc631 | internal | t | 2023-07-13 19:06:12.531952 | hfai
(2 rows)
mars_db=# select * from "user_access_token";
from_user_name | access_user_name | access_token | access_scope | expire_at | created_at | updated_at | created_by | deleted_by | active
----------------+------------------+--------------------------------------------------------------------------------+--------------+---------------------+----------------------------+----------------------------+------------+------------+--------
bff_admin | bff_admin | ACCESS-6255665f61646d696e236266665f61646d696e-ej5ZEZpxLNQzUiD3TBa1R26qknIwhi-F | all | 3000-01-01 00:00:00 | 2023-07-13 19:10:46.015107 | 2023-07-13 20:48:55.507183 | bff_admin | | t
haiadmin | haiadmin | ACCESS-68516961646d696e2368616961646d696e-E0lGXwIswnn0HpbXAW_tVRjga1wRjD0u | all | 3000-01-01 00:00:00 | 2023-07-13 19:10:41.729162 | 2023-07-19 09:59:28.134549 | haiadmin | | t
(2 rows)
mars_db=# \q
root@hai-platform-0:/# exit
root@xxx-node1:~# hai-cli init haiadmin --url http://xxx.com
发现原始 token,向 server 端申请注册 access token
向 server 端申请注册 access token 失败,保存原始 token
初始化成功, 目标配置 /root/.hfai/conf.yml, 配置如下:
token: haiadmin
root@xxx-node1:~# hai-cli init ACCESS-68516961646d696e2368616961646d696e-E0lGXwIswnn0HpbXAW_tVRjga1wRjD0u --url http://xxx.com
初始化成功, 目标配置 /root/.hfai/conf.yml, 配置如下:
token: ACCESS-68516961646d696e2368616961646d696e-E0lGXwIswnn0HpbXAW_tVRjga1wRjD0u
the requests are sent to haproxy with operating server as backend,
could you please also check the logfile in {HAI_PLATFORM_PATH}/log/operating_0.log
to see if there is any abnormal, for example, requests not hitting the backend, the server reports any exception, etc.
Hello, I encountered the same issue when submitting a task. Have you solved it? @wenjun93 @zzr93
log in {HAI_PLATFORM_PATH}/log/operating_0.log
2024-04-10 15:02:36.579 | ERROR | | SpawnProcess-1 | [UserData] 订阅表 [user_with_all_groups] 失败
2024-04-10 15:02:36.579 | ERROR | | SpawnProcess-1 | [UserData] Reload table user_all_groups failed!
2024-04-10 15:02:36.579 | ERROR | | (psycopg2.OperationalError) connection to server at "127.0.0.1", port 15432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 3280, in _wrap_pool_connect
return fn()
└ <bound method Pool.connect of <sqlalchemy.pool.impl.QueuePool object at 0x7f615aa558b0>>
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 310, in connect
return _ConnectionFairy._checkout(self)
│ │ └ <sqlalchemy.pool.impl.QueuePool object at 0x7f615aa558b0>
│ └ <classmethod object at 0x7f615c825bb0>
└ <class 'sqlalchemy.pool.base._ConnectionFairy'>
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 868, in _checkout
fairy = _ConnectionRecord.checkout(pool)
│ │ └ <sqlalchemy.pool.impl.QueuePool object at 0x7f615aa558b0>
│ └ <classmethod object at 0x7f615c825b50>
└ <class 'sqlalchemy.pool.base._ConnectionRecord'>
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 476, in checkout
rec = pool._do_get()
│ └ <function QueuePool._do_get at 0x7f615c8418b0>
└ <sqlalchemy.pool.impl.QueuePool object at 0x7f615aa558b0>
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/impl.py", line 146, in _do_get
self._dec_overflow()
│ └ <function QueuePool._dec_overflow at 0x7f615c8419d0>
└ <sqlalchemy.pool.impl.QueuePool object at 0x7f615aa558b0>
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
compat.raise_(
│ └ <function raise_ at 0x7f615d066d30>
└ <module 'sqlalchemy.util.compat' from '/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/compat.py'>
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/compat.py", line 208, in raise_
raise exception
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/impl.py", line 143, in _do_get
return self._create_connection()
│ └ <function Pool._create_connection at 0x7f615c81dd30>
└ <sqlalchemy.pool.impl.QueuePool object at 0x7f615aa558b0>
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 256, in _create_connection
return _ConnectionRecord(self)
│ └ <sqlalchemy.pool.impl.QueuePool object at 0x7f615aa558b0>
└ <class 'sqlalchemy.pool.base._ConnectionRecord'>
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 371, in __init__
self.__connect()
└ <sqlalchemy.pool.base._ConnectionRecord object at 0x7f6154a86e50>
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 666, in __connect
pool.logger.debug("Error on connect(): %s", e)
│ │ └ <function Logger.debug at 0x7f615f315160>
│ └ <Logger sqlalchemy.pool.impl.QueuePool (WARNING)>
└ <sqlalchemy.pool.impl.QueuePool object at 0x7f615aa558b0>
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
compat.raise_(
│ └ <function raise_ at 0x7f615d066d30>
└ <module 'sqlalchemy.util.compat' from '/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/compat.py'>
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/compat.py", line 208, in raise_
raise exception
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 661, in __connect
self.dbapi_connection = connection = pool._invoke_creator(self)
│ │ │ │ └ <sqlalchemy.pool.base._ConnectionRecord object at 0x7f6154a86e50>
│ │ │ └ <function create_engine.<locals>.connect at 0x7f615c2ba3a0>
│ │ └ <sqlalchemy.pool.impl.QueuePool object at 0x7f615aa558b0>
│ └ None
└ <sqlalchemy.pool.base._ConnectionRecord object at 0x7f6154a86e50>
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/create.py", line 590, in connect
return dialect.connect(*cargs, **cparams)
│ │ │ └ {'host': '127.0.0.1', 'database': 'mars_db', 'user': 'root', 'password': 'root', 'port': 15432, 'application_name': 'multi-se...
│ │ └ []
│ └ <function DefaultDialect.connect at 0x7f615c5b4ee0>
└ <sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2 object at 0x7f615c245a30>
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 597, in connect
return self.dbapi.connect(*cargs, **cparams)
│ │ │ │ └ {'host': '127.0.0.1', 'database': 'mars_db', 'user': 'root', 'password': 'root', 'port': 15432, 'application_name': 'multi-se...
│ │ │ └ ()
│ │ └ <function connect at 0x7f615c1d1310>
│ └ <module 'psycopg2' from '/usr/local/lib/python3.8/dist-packages/psycopg2/__init__.py'>
└ <sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2 object at 0x7f615c245a30>
File "/usr/local/lib/python3.8/dist-packages/psycopg2/__init__.py", line 122, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
│ │ │ └ {}
│ │ └ None
│ └ 'host=127.0.0.1 user=root password=root port=15432 application_name=multi-server-server dbname=mars_db'
└ <built-in function _connect>
psycopg2.OperationalError: connection to server at "127.0.0.1", port 15432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
The above exception was the direct cause of the following exception:
...
problem solved,see here https://github.com/HFAiLab/hai-platform/issues/12#issuecomment-2049179483
According to README.md, I deployed hai-platform and installed hai-cli successfully. "hai-cli init" using my token and url also succeed. However, when I try "hai-cli python /haidata/hai-platform/workspace/haiadmin/test.py -- -n 1", an error occured unexpectedly, here is the message
It seems that server returns code 404 to the client on the task create url -> "{mars_url()}/operating/task/create?token={token})". I have no idea why this would happen.
Further information can be provided if needed. I am sure the token and url is correct since I can successfully init. I am also sure the test.py exists on the shared_filesystem otherwise hai-cli would report another error.