Open natolambert opened 1 month ago
Inference works distributed, but I couldn't get the results gather working correctly with things like
state.wait_for_everyone()
# flatten results list of lists if is list of lists
if state.is_main_process:
logger.info("Gathering results")
results = gather_object(results) # gather() is for tensors
Currently
run_rm.py
only uses one RM because RMs are not well supported generally for inference. Current implementation is a separaterun_rm_mpgu.py
script. We can delete this and improve the base script if more use cases emerge.Closes #95