Windows support modifications in pretrain script

Modified pretrain.py : Did a couple of changes for Windows compatibility, which will not affect the integrity and functionality of the original code for Linux/Unix-based OS.

Regarding the nccl backend, there is no official nccl support for Windows, and the third-party nccl for Windows only supports CUDA 11 which implies very limited version compatibility. So I changed the backend to gloo when the operating system is Windows.
"OSError: [Errno 22] Invalid argument" and "_pickle.UnpicklingError: pickle data was truncated" will occur when using the torch.utils.data.DataLoader class for loading data in a multiprocess mode. This may happen to the Windows environment. The serialize and deserialize process using pickle may fail when doing fork-based multiprocessing because it is not supported in Windows. The error message implies a truncation happened to the pickle file in the multiprocessing module. There is no known solution for this issue, so the current workaround is to disable multiprocessing and use the main process to load data instead by setting num_workers=0, only when the OS is Windows. This shall be marked as an open issue to the Windows platform support and we may come back for it later.

Modified requirement.txt : Added scikit-learn as the dependency. Modified README.md : Added instructions on how to run PyTorch with distributed training and automatically set the environment variables.

DLLXW / baby-llama2-chinese

Windows support modifications in pretrain script #32