Open wood-j opened 1 year ago
By the way, If fum
only does 2 thing as the doc said:
Rows in mysql eggroll_meta.server_node
is incorrect after upgrades and pods up:
Which cound lead to follwing error running toy test:
[ERROR] [2023-04-25 09:09:31,735] [202304250903258499330] [1940:139857564890944] - [task_executor._run_] [line:265]: processor in session meta is not valid: <ErSessionMeta(id=202304250903258499330_secure_add_example_0_0_host_10000, name=, status=NEW, tag=, processors=[***, len=2], options=[{'python.venv': '/data/projects/python/venv', 'eggroll.session.processors.per.node': '1', 'eggroll.session.deploy.mode': 'cluster', 'python.path': '/data/projects/fate/fate/python:$PYTHONPATH:/data/projects/fate/fate/python:/data/projects/fate/eggroll/python:/data/projects/fate/fateflow/python:/data/projects/fate/fate/python/fate_client', 'eggroll.rollpair.inmemory_output': 'True'}]) at 0x7f32fb65de50>
Traceback (most recent call last):
File "/data/projects/fate/fateflow/python/fate_flow/worker/task_executor.py", line 148, in _run_
sess.init_computing(computing_session_id=args.session_id, options=session_options)
File "/data/projects/fate/fate/python/fate_arch/session/_session.py", line 118, in init_computing
self._computing_session = CSession(
File "/data/projects/fate/fate/python/fate_arch/computing/eggroll/_csession.py", line 38, in __init__
self._rp_session = session_init(session_id=session_id, options=options)
File "/data/projects/fate/eggroll/python/eggroll/core/session.py", line 42, in session_init
er_session = ErSession(session_id=session_id, options=options)
File "/data/projects/fate/eggroll/python/eggroll/core/session.py", line 199, in __init__
self.__session_meta = self._cluster_manager_client.get_or_create_session(session_meta)
File "/data/projects/fate/eggroll/python/eggroll/core/client.py", line 185, in get_or_create_session
return self.__check_processors(
File "/data/projects/fate/eggroll/python/eggroll/core/client.py", line 243, in __check_processors
raise ValueError(f"processor in session meta is not valid: {session_meta}")
ValueError: processor in session meta is not valid: <ErSessionMeta(id=202304250903258499330_secure_add_example_0_0_host_10000, name=, status=NEW, tag=, processors=[***, len=2], options=[{'python.venv': '/data/projects/python/venv', 'eggroll.session.processors.per.node': '1', 'eggroll.session.deploy.mode': 'cluster', 'python.path': '/data/projects/fate/fate/python:$PYTHONPATH:/data/projects/fate/fate/python:/data/projects/fate/eggroll/python:/data/projects/fate/fateflow/python:/data/projects/fate/fate/python/fate_client', 'eggroll.rollpair.inmemory_output': 'True'}]) at 0x7f32fb65de50>
Correct rows should be:
For any one tring to upgrade from 1.8.0 to 1.9.2(1.9.0), follwing sql line should be excuted in eggroll_meta.server_node
:
truncate table server_node;
INSERT INTO server_node (host, port, node_type, status) values ('clustermanager', '4670', 'CLUSTER_MANAGER', 'HEALTHY');
INSERT INTO server_node (host, port, node_type, status) values ('nodemanager-0.nodemanager', '4671', 'NODE_MANAGER', 'HEALTHY');
INSERT INTO server_node (host, port, node_type, status) values ('nodemanager-1.nodemanager', '4671', 'NODE_MANAGER', 'HEALTHY');
Idea from:
Maybe we should update above lines to fate released version upgrade sql file:
1.8.0->1.9.x The upgrade host field does not seem to be updated correctly. Thanks a lot for your suggestion, but this method is not easy to implement. It's hard to know how many nodemanagers there are during an upgrade. When there are multiple nodemanagers, the corresponding sql scripts also need to be changed. In this case, I recommend doing the sql manually. The upgrade documentation will be updated later.
Is your feature request related to a problem? Please describe. No. The version upgrade doc is out of date and may not working properly.
Describe the solution you'd like
I am tring to use doc to upgrade my cluster(on k8s) from
1.8.0
to1.9.2
with all persistent data.But as the
cluster.yaml
of 1.9.2 has imported some new config lines, like:The basic guide to update is:
The upgrade task won't post as those new config is missing.
Describe alternatives you've considered
Maybe we need to clear what to update in
cluster.yaml
for each version.Additional context
And the 1.9.2 changed some
pod
fromdeploy
tostatefull set
, the pvc created from chart has changed, the persistent data (path) need to be manually migrated to new path too.