An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
Apache License 2.0
yangzhipeng1108 commented 3 weeks ago

At least one of the input arguments for this task could not be computed: ray.exceptions.OwnerDiedError: Failed to retrieve object 004553850c97129b58c533c101cb5c1bc4de6d930200000002e1f505. To see information about where this ObjectRef was created in Python, set the environment variable RAY_record_ref_creation_sites=1 during ray start and ray.init().

The object's owner has exited. This is the Python worker that first created the ObjectRef via .remote() or ray.put(). Check cluster logs (/tmp/ray/session_latest/logs/*4fe82a45e0c8ef9803c3c57b6583ae52de04fd6c5da6abc6f49a8bd9* at IP address for more information about the Python worker failure.


hijkzzz commented 3 weeks ago

Do you use the container

yangzhipeng1108 commented 3 weeks ago

yangzhipeng1108 commented 3 weeks ago

Do you use the container

use this dockerfile