PaddlePaddle / PARL

A high-performance distributed training framework for Reinforcement Learning
https://parl.readthedocs.io/
Apache License 2.0
3.28k stars 817 forks source link

the 'master' is of version 'pyarrow=True'. However, 'pyarrow=4.0.1'is provided in your current , pyarrow=True, how should I fit this need? #641

Open Rinstein opened 3 years ago

Rinstein commented 3 years ago

/home/lrw/anaconda3/envs/parl/bin/python /home/lrw/Downloads/pycharm-community-2021.1.1/plugins/python-ce/helpers/pydev/pydevd.py --multiproc --qt-support=auto --client 127.0.0.1 --port 36145 --file /home/lrw/pythonProjects/PARL/examples/A2C/train.py Connected to pydev debugger (build 211.7142.13) [06-03 11:00:57 MainThread @logger.py:242] Argv: /home/lrw/pythonProjects/PARL/examples/A2C/train.py /home/lrw/pythonProjects/PARL/parl/remote/communication.py:38: FutureWarning: 'pyarrow.default_serialization_context' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead. context = pyarrow.default_serialization_context() [06-03 11:02:02 MainThread @machine_info.py:88] nvidia-smi -L found gpu count: 2 [06-03 11:02:02 MainThread @machine_info.py:109] WRN Found non-empty CUDA_VISIBLE_DEVICES. But PARL found that Paddle was not complied with CUDA, which may cause issues. Thus PARL will not use GPU. /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages/paddle/fluid/clip.py:779: UserWarning: Caution! 'set_gradient_clip' is not recommended and may be deprecated in future! We recommend a new strategy: set 'grad_clip' when initializing the 'optimizer'. This method can reduce the mistakes, please refer to documention of 'optimizer'. warnings.warn("Caution! 'set_gradient_clip' is not recommended " [06-03 11:02:03 MainThread @machine_info.py:88] nvidia-smi -L found gpu count: 2 [06-03 11:02:03 MainThread @machine_info.py:109] WRN Found non-empty CUDA_VISIBLE_DEVICES. But PARL found that Paddle was not complied with CUDA, which may cause issues. Thus PARL will not use GPU. [06-03 11:02:04 MainThread @machine_info.py:88] nvidia-smi -L found gpu count: 2 [06-03 11:02:04 MainThread @machine_info.py:109] WRN Found non-empty CUDA_VISIBLE_DEVICES. But PARL found that Paddle was not complied with CUDA, which may cause issues. Thus PARL will not use GPU. E0603 11:02:14.516618275 25706 socket_utils_common_posix.cc:223] check for SO_REUSEPORT: {"created":"@1622689334.516609247","description":"SO_REUSEPORT unavailable on compiling system","file":"src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":192} Traceback (most recent call last): File "/home/lrw/Downloads/pycharm-community-2021.1.1/plugins/python-ce/helpers/pydev/pydevd.py", line 1483, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "/home/lrw/Downloads/pycharm-community-2021.1.1/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/home/lrw/pythonProjects/PARL/examples/A2C/train.py", line 222, in learner = Learner(config) File "/home/lrw/pythonProjects/PARL/examples/A2C/train.py", line 81, in init self.create_actors() File "/home/lrw/pythonProjects/PARL/examples/A2C/train.py", line 86, in create_actors parl.connect(self.config['master_address']) File "/home/lrw/pythonProjects/PARL/parl/remote/client.py", line 434, in connect distributed_files) File "/home/lrw/pythonProjects/PARL/parl/remote/client.py", line 74, in init self.check_env_consistency() File "/home/lrw/pythonProjects/PARL/parl/remote/client.py", line 243, in check_env_consistency raise Exception(error_message) Exception: Version mismatch: the 'master' is of version 'pyarrow=True'. However, 'pyarrow=4.0.1'is provided in your current environment. python-BaseException

Process finished with exit code 1

TomorrowIsAnOtherDay commented 3 years ago

It seems to be a bug of checking the pyarrow version. Are you running xparl at multiple machines or a single machine ?

Rinstein commented 3 years ago

single machine, I have no idea to install the version of pyarrow=True

TomorrowIsAnOtherDay commented 3 years ago

I guess that parl fails to get the exact pyarrow version in your environment. Please provide your environment information such that we can reproduce the problem. OS / parl version / paddle version.

TomorrowIsAnOtherDay commented 3 years ago

If you would like to bypass the issue and leave the problem to us, just remove the pyarrow from the environment:

pip uninstall pyarrow
Rinstein commented 3 years ago

my env is ubuntu 18.04, parl 1.4.3, paddlepaddle 1.8.5, after I remove pyarrow, it happen to this error: Exception: "pyarrow" is provided in "master"'s enviroment, however, it is not found in your current environment. To use "pyarrow" for serialization, please install "pyarrow=False" in your current environment!

TomorrowIsAnOtherDay commented 3 years ago

Have you restarted the cluster? The cluster has to restart after the environment is updated.

xparl stop
xparl start ...
TomorrowIsAnOtherDay commented 3 years ago

my env is ubuntu 18.04, parl 1.4.3, paddlepaddle 1.8.5, after I remove pyarrow, it happen to this error: Exception: "pyarrow" is provided in "master"'s enviroment, however, it is not found in your current environment. To use "pyarrow" for serialization, please install "pyarrow=False" in your current environment!

are you using the anaconda or installing all the packages in the original python provided by the operating system ?

Rinstein commented 3 years ago

I use anaconda to management my enviroment, and I restart the xparl before I run my program

TomorrowIsAnOtherDay commented 3 years ago

Cloud you provide the log of following commands ?

which xparl
which pip
Rinstein commented 3 years ago

(parl) lrw@mars-2080tix2:~/pythonProjects/PARL$ which xparl /home/lrw/anaconda3/envs/parl/bin/xparl (parl) lrw@mars-2080tix2:~/pythonProjects/PARL$ which pip /home/lrw/anaconda3/envs/parl/bin/pip (parl) lrw@mars-2080tix2:~/pythonProjects/PARL$

TomorrowIsAnOtherDay commented 3 years ago

Thanks a lot. I'm afraid that a different python is used to launch the master node. Please provide the log of the following command:

import sys
print(sys.executable)
zenghsh3 commented 3 years ago

Hi, can you execute the following processes, and paste the whole log.

  1. create test.py
    
    import parl
    import sys
    print("sys.executable: ", sys.executable)

@parl.remote_class class Agent(object):

def say_hello(self):
    print("Hello World!")

parl.connect('localhost:8010') agent = Agent() agent.say_hello() print("done")


2. create `test.sh`
```bash
echo `which python`
echo `which xparl`
echo `which pip`

python -m pip uninstall -y parl
python -m pip uninstall -y pyarrow
python -m pip install parl

echo `which python`
echo `which xparl`
echo `which pip`

xparl stop
xparl start --port 8010 --cpu_num 1
python test.py
  1. run sh test.sh and paste the whole log.
Rinstein commented 3 years ago

(parl) lrw@mars-2080tix2:~/pythonProjects/redesign_macsrl/macsrl_code_only_grpc/VividTestAlgorithm$ sh test.sh /home/lrw/anaconda3/envs/parl/bin/python /home/lrw/anaconda3/envs/parl/bin/xparl /home/lrw/anaconda3/envs/parl/bin/pip Found existing installation: parl 1.4.3 Uninstalling parl-1.4.3: Successfully uninstalled parl-1.4.3 WARNING: Skipping pyarrow as it is not installed. Collecting parl Using cached parl-1.4.3-py2.py3-none-any.whl (574 kB) Requirement already satisfied: termcolor>=1.1.0 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from parl) (1.1.0) Requirement already satisfied: click in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from parl) (7.1.2) Requirement already satisfied: psutil>=5.6.2 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from parl) (5.8.0) Requirement already satisfied: flask>=1.0.4 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from parl) (1.1.2) Requirement already satisfied: pyzmq==18.1.1 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from parl) (18.1.1) Requirement already satisfied: flask-cors in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from parl) (3.0.10) Requirement already satisfied: tb-nightly==1.15.0a20190801 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from parl) (1.15.0a20190801) Requirement already satisfied: requests in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from parl) (2.25.1) Requirement already satisfied: tensorboardX==1.8 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from parl) (1.8) Requirement already satisfied: scipy>=1.0.0 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from parl) (1.5.2) Requirement already satisfied: grpcio>=1.27.2 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from parl) (1.35.0) Requirement already satisfied: protobuf>=3.14.0 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from parl) (3.14.0) Requirement already satisfied: cloudpickle==1.6.0 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from parl) (1.6.0) Requirement already satisfied: numpy>=1.12.0 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from tb-nightly==1.15.0a20190801->parl) (1.19.2) Requirement already satisfied: werkzeug>=0.11.15 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from tb-nightly==1.15.0a20190801->parl) (1.0.1) Requirement already satisfied: setuptools>=41.0.0 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from tb-nightly==1.15.0a20190801->parl) (52.0.0.post20210125) Requirement already satisfied: six>=1.10.0 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from tb-nightly==1.15.0a20190801->parl) (1.15.0) Requirement already satisfied: markdown>=2.6.8 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from tb-nightly==1.15.0a20190801->parl) (3.3.4) Requirement already satisfied: absl-py>=0.4 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from tb-nightly==1.15.0a20190801->parl) (0.12.0) Requirement already satisfied: wheel>=0.26 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from tb-nightly==1.15.0a20190801->parl) (0.36.2) Requirement already satisfied: Jinja2>=2.10.1 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from flask>=1.0.4->parl) (2.11.3) Requirement already satisfied: itsdangerous>=0.24 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from flask>=1.0.4->parl) (1.1.0) Requirement already satisfied: MarkupSafe>=0.23 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from Jinja2>=2.10.1->flask>=1.0.4->parl) (1.1.1) Requirement already satisfied: importlib-metadata in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from markdown>=2.6.8->tb-nightly==1.15.0a20190801->parl) (4.0.1) Requirement already satisfied: zipp>=0.5 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from importlib-metadata->markdown>=2.6.8->tb-nightly==1.15.0a20190801->parl) (3.4.1) Requirement already satisfied: typing-extensions>=3.6.4 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from importlib-metadata->markdown>=2.6.8->tb-nightly==1.15.0a20190801->parl) (3.7.4.3) Requirement already satisfied: chardet<5,>=3.0.2 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from requests->parl) (4.0.0) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from requests->parl) (1.26.4) Requirement already satisfied: certifi>=2017.4.17 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from requests->parl) (2021.5.30) Requirement already satisfied: idna<3,>=2.5 in /home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages (from requests->parl) (2.10) Installing collected packages: parl Successfully installed parl-1.4.3 /home/lrw/anaconda3/envs/parl/bin/python /home/lrw/anaconda3/envs/parl/bin/xparl /home/lrw/anaconda3/envs/parl/bin/pip [06-03 16:02:01 MainThread @logger.py:242] Argv: /home/lrw/anaconda3/envs/parl/bin/xparl stop [06-03 16:02:02 MainThread @utils.py:79] WRN paddlepaddle version: 2.1.0. The dynamic graph version of PARL is under development, not fully tested and supported kill: (22935): No such process kill: (22941): No such process kill: (22947): No such process kill: (22953): No such process [06-03 16:02:02 MainThread @logger.py:242] Argv: /home/lrw/anaconda3/envs/parl/bin/xparl start --port 8010 --cpu_num 1 [06-03 16:02:03 MainThread @utils.py:79] WRN paddlepaddle version: 2.1.0. The dynamic graph version of PARL is under development, not fully tested and supported

        # The Parl cluster is started at localhost:8010.

        # A local worker with 1 CPUs is connected to the cluster.    

        # Starting the cluster monitor...

    ## If you want to check cluster status, please view:

        http://192.xxx..xxx.xxx:55325

    or call:

        xparl status

    ## If you want to add more CPU resources, please call:

        xparl connect --address 192.xxx..xxx.xxx:8010

    ## If you want to shutdown the cluster, please call:

        xparl stop        

E0603 16:02:07.813417680 23105 socket_utils_common_posix.cc:223] check for SO_REUSEPORT: {"created":"@1622707327.813408175","description":"SO_REUSEPORT unavailable on compiling system","file":"src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":192} Checking status of log_server...

Start the log server sucessfully.

[06-03 16:02:08 MainThread @logger.py:242] Argv: test.py [06-03 16:02:09 MainThread @utils.py:79] WRN paddlepaddle version: 2.1.0. The dynamic graph version of PARL is under development, not fully tested and supported E0603 16:02:09.160348951 23162 socket_utils_common_posix.cc:223] check for SO_REUSEPORT: {"created":"@1622707329.160340365","description":"SO_REUSEPORT unavailable on compiling system","file":"src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":192} E0603 16:02:09.164652929 22982 socket_utils_common_posix.cc:223] check for SO_REUSEPORT: {"created":"@1622707329.164628521","description":"SO_REUSEPORT unavailable on compiling system","file":"src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":192} sys.executable: /home/lrw/anaconda3/envs/parl/bin/python E0603 16:02:09.653611595 23147 socket_utils_common_posix.cc:223] check for SO_REUSEPORT: {"created":"@1622707329.653597274","description":"SO_REUSEPORT unavailable on compiling system","file":"src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":192} [06-03 16:02:09 MainThread @client.py:435] Remote actors log url: http://192.xxx..xxx.xxx:55325/logs?client_id=192.xxx..xxx.xxx_44497_1622707329 done (parl) lrw@mars-2080tix2:~/pythonProjects/redesign_macsrl/macsrl_code_only_grpc/VividTestAlgorithm$ E0603 16:02:10.525752941 23312 socket_utils_common_posix.cc:223] check for SO_REUSEPORT: {"created":"@1622707330.525740926","description":"SO_REUSEPORT unavailable on compiling system","file":"src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":192} E0603 16:02:11.869578383 23437 socket_utils_common_posix.cc:223] check for SO_REUSEPORT: {"created":"@1622707331.869570318","description":"SO_REUSEPORT unavailable on compiling system","file":"src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":192} (parl) lrw@mars-2080tix2:~/pythonProjects/redesign_macsrl/macsrl_code_only_grpc/VividTestAlgorithm$

TomorrowIsAnOtherDay commented 3 years ago

It seems that you can run distributed computation with xparl. Can you try run the A2C example again? It should work as expected now.

Rinstein commented 3 years ago

sorry, but it does not work, after I start xparl in 8010, then run examples/A2C/train.py, get error as follows (same as before):

/home/lrw/anaconda3/envs/parl/lib/python3.6/site-packages/paddle/fluid/clip.py:779: UserWarning: Caution! 'set_gradient_clip' is not recommended and may be deprecated in future! We recommend a new strategy: set 'grad_clip' when initializing the 'optimizer'. This method can reduce the mistakes, please refer to documention of 'optimizer'. warnings.warn("Caution! 'set_gradient_clip' is not recommended " [06-03 16:20:42 MainThread @machine_info.py:88] nvidia-smi -L found gpu count: 2 [06-03 16:20:42 MainThread @machine_info.py:109] WRN Found non-empty CUDA_VISIBLE_DEVICES. But PARL found that Paddle was not complied with CUDA, which may cause issues. Thus PARL will not use GPU. [06-03 16:20:42 MainThread @machine_info.py:88] nvidia-smi -L found gpu count: 2 [06-03 16:20:42 MainThread @machine_info.py:109] WRN Found non-empty CUDA_VISIBLE_DEVICES. But PARL found that Paddle was not complied with CUDA, which may cause issues. Thus PARL will not use GPU. E0603 16:20:42.459132368 2635 socket_utils_common_posix.cc:223] check for SO_REUSEPORT: {"created":"@1622708442.459123176","description":"SO_REUSEPORT unavailable on compiling system","file":"src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":192} Traceback (most recent call last): File "/home/lrw/pythonProjects/PARL/examples/A2C/train.py", line 222, in learner = Learner(config) File "/home/lrw/pythonProjects/PARL/examples/A2C/train.py", line 81, in init self.create_actors() File "/home/lrw/pythonProjects/PARL/examples/A2C/train.py", line 86, in create_actors parl.connect(self.config['master_address']) File "/home/lrw/pythonProjects/PARL/parl/remote/client.py", line 434, in connect distributed_files) File "/home/lrw/pythonProjects/PARL/parl/remote/client.py", line 74, in init self.check_env_consistency() File "/home/lrw/pythonProjects/PARL/parl/remote/client.py", line 243, in check_env_consistency raise Exception(error_message) Exception: "pyarrow" is provided in "master"'s enviroment, however, it is not found in your current environment. To use "pyarrow" for serialization, please install "pyarrow=False" in your current environment!

TomorrowIsAnOtherDay commented 3 years ago

Thanks for your kind and patient reply. I have discussed with @zenghsh3 and we guessed it might result from incorrect environment configuration. May I add your wechat account for further discussion ? (I guess you are the developer from China?)