FederatedAI / FATE

An Industrial Grade Federated Learning Framework
Apache License 2.0
5.71k stars 1.55k forks source link

fate_flow service start failed #1037

Closed Mantj closed 4 months ago

Mantj commented 4 years ago

Describe the bug I install Fate Cluster follow this guide:https://github.com/FederatedAI/FATE/blob/master/cluster-deploy/doc/Fate-cluster_deployment_guide_install_zh.md

When I come to 6.1,I run sh services.sh all start ,every model is started successfully but fate_flow.

Then I check the logs/error.log and console.log as hint, but their all empty file. And I go to my file /data/projects/fate/python/fate_flow/settings.py, I think it's ok.

And then I excute python fate_flow_server.py, it comes an error but I cannot handle it :

`Traceback (most recent call last): File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/peewee.py", line 2875, in execute_sql cursor.execute(sql, params or ()) File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/pymysql/cursors.py", line 170, in execute result = self._query(query) File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/pymysql/cursors.py", line 328, in _query conn.query(q) File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/pymysql/connections.py", line 517, in query self._affected_rows = self._read_query_result(unbuffered=unbuffered) File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/pymysql/connections.py", line 732, in _read_query_result result.read() File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/pymysql/connections.py", line 1075, in read first_packet = self.connection._read_packet() File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/pymysql/connections.py", line 684, in _read_packet packet.check_error() File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/pymysql/protocol.py", line 220, in check_error err.raise_mysql_exception(self._data) File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/pymysql/err.py", line 109, in raise_mysql_exception raise errorclass(errno, errval) pymysql.err.InternalError: (1030, "Got error 168 - 'Unknown (generic) error from engine' from storage engine")

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "fate_flow_server.py", line 83, in init_database_tables() File "/home/data/projects/fate/python/fate_flow/db/db_models.py", line 106, in init_database_tables DB.create_tables(table_objs) File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/peewee.py", line 3036, in create_tables model.create_table(options) File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/peewee.py", line 6060, in create_table cls._schema.create_all(safe, options) File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/peewee.py", line 5269, in create_all self.create_table(safe, table_options) File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/peewee.py", line 5155, in create_table self.database.execute(self._create_table(safe=safe, options)) File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/peewee.py", line 2888, in execute return self.execute_sql(sql, params, commit=commit) File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/peewee.py", line 2882, in execute_sql self.commit() File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/peewee.py", line 2666, in exit reraise(new_type, new_type(*exc_args), traceback) File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/peewee.py", line 179, in reraise raise value.with_traceback(tb) File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/peewee.py", line 2875, in execute_sql cursor.execute(sql, params or ()) File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/pymysql/cursors.py", line 170, in execute result = self._query(query) File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/pymysql/cursors.py", line 328, in _query conn.query(q) File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/pymysql/connections.py", line 517, in query self._affected_rows = self._read_query_result(unbuffered=unbuffered) File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/pymysql/connections.py", line 732, in _read_query_result result.read() File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/pymysql/connections.py", line 1075, in read first_packet = self.connection._read_packet() File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/pymysql/connections.py", line 684, in _read_packet packet.check_error() File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/pymysql/protocol.py", line 220, in check_error err.raise_mysql_exception(self._data) File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/pymysql/err.py", line 109, in raise_mysql_exception raise errorclass(errno, errval) peewee.InternalError: (1030, "Got error 168 - 'Unknown (generic) error from engine' from storage engine")`

Please help~

jarviszeng-zjc commented 4 years ago

Hi, Please check the DB configuration in the fate_flow/settings.py and then check you MySQL is ok.

Mantj commented 4 years ago

@zengjice thanks for you response~ I've check the file fate_flow/settings.py, and it's setting is, and 10.12.26.119 is my IP:

IP = '0.0.0.0' GRPC_PORT = 9360 HTTP_PORT = 9380 ZOOKEEPER_HOSTS = ['127.0.0.1:2181'] CLUSTER_STANDALONE_JOB_SERVER_PORT = 9381 WORK_MODE = 1 USE_LOCAL_DATABASE = True USE_AUTHENTICATION = False USE_CONFIGURATION_CENTER = False PRIVILEGE_COMMAND_WHITELIST = ['save_pipeline', 'clean'] DATABASE = { 'name': 'fate_flow', 'user': 'root', 'passwd': 'fate_dev', 'host': '10.12.26.119', 'port': 3306, 'max_connections': 100, 'stale_timeout': 30, } REDIS = { 'host': '10.12.26.119', 'port': 6379, 'password': 'fate_dev', 'max_connections': 500 }

Mantj commented 4 years ago

@zengjice Hi, I reinstall MYSQL module, and now MYSQL service can start sucessfully, but fate_flow is also cannot start, and the logs/error.log shows:

Traceback (most recent call last): File "/home/data/projects/fate/python/fate_flow/fate_flow_server.py", line 93, in session.init(mode=RuntimeConfig.WORK_MODE, backend=Backend.EGGROLL) File "/home/data/projects/fate/python/arch/api/session.py", line 52, in init session = build_session(job_id=job_id, work_mode=mode, backend=backend) File "/home/data/projects/fate/python/arch/api/table/session.py", line 38, in build_session session = session_impl.FateSessionImpl(eggroll_session, work_mode, persistent_engine) File "/home/data/projects/fate/python/arch/api/table/eggroll/session_impl.py", line 33, in init self._eggroll = eggroll_util.build_eggroll_runtime(work_mode=work_mode, eggroll_session=eggroll_session) File "/home/data/projects/fate/python/arch/api/table/eggroll_util.py", line 44, in build_eggroll_runtime return eggroll_init(eggroll_session) File "/home/data/projects/fate/eggroll/python/eggroll/api/cluster/eggroll.py", line 79, in eggroll_init eggroll_runtime = _EggRoll(eggroll_session=eggroll_session) File "/home/data/projects/fate/eggroll/python/eggroll/api/cluster/eggroll.py", line 364, in init self.session_stub.getOrCreateSession(self.eggroll_session.to_protobuf()) File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/grpc/_channel.py", line 533, in call return _end_unary_response_blocking(state, call, False, None) File "/home/data/projects/fate/common/python/venv/lib/python3.6/site-packages/grpc/_channel.py", line 467, in _end_unary_response_blocking raise _Rendezvous(state, None, None, deadline) grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "OS Error" debug_error_string = "{"created":"@1583482449.752061606","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1017,"grpc_message":"OS Error","grpc_status":14}"

I think is EGGROLL module's problem so I reinstall it, but this error also occured, what can I do?Is I miss some configuration?

silvanabc commented 4 years ago

@Mantj Have you solved the problem? I'm facing something similar.

Yuchen-Li commented 4 years ago

@Mantj I have the same issue. Have you found the solutions?

tdye24 commented 2 years ago

Have you solved the problem? I'm facing something similar.