Closed skepsun closed 1 year ago
Did you include the ulimit -n 64000
call in your launch command? There's an example usage in the readme.
can you also share the versions of python and torch you're using? Will investigate more later today.
Just tried to add ulimit -n 64000
but still got the same error message. I am using Python 3.10.11 with torch 2.0.1. Here is a list of installed packages:
Package Version Editable project location
------------------------ ------------ --------------------------
absl-py 1.4.0
accelerate 0.20.3
aiofiles 23.1.0
aiohttp 3.8.4
aiosignal 1.3.1
aliyun-python-sdk-core 2.13.36
aliyun-python-sdk-kms 2.16.1
altair 5.0.1
antlr4-python3-runtime 4.9.3
anyio 3.7.0
appdirs 1.4.4
arrow 1.2.3
asttokens 2.0.5
async-timeout 4.0.2
attrs 23.1.0
backcall 0.2.0
bcrypt 4.0.1
beautifulsoup4 4.12.2
bitsandbytes 0.39.0
blessed 1.20.0
brotlipy 0.7.0
cachetools 5.3.1
cattrs 23.1.2
certifi 2021.5.30
cffi 1.15.1
cfgv 3.3.1
chardet 3.0.4
charset-normalizer 2.0.4
cheroot 10.0.0
click 8.1.3
cmake 3.26.3
coati 1.0.0
colossalai 0.2.8
contexttimer 0.3.3
contourpy 1.0.7
cpm-kernels 1.0.11
crcmod 1.7
croniter 1.3.15
cryptography 39.0.1
cycler 0.11.0
data-serialize 0.2.1
dataclasses-json 0.5.7
datasets 2.12.0
dateutils 0.6.12
debugpy 1.5.1
decorator 5.1.1
deep-training 0.1.10.post1
deepdiff 6.3.0
deepspeed 0.9.5
delta-center-client 0.0.4
dill 0.3.6
distlib 0.3.6
docker-pycreds 0.4.0
einops 0.6.1
exceptiongroup 1.1.1
executing 0.8.3
fabric 3.1.0
faiss-gpu 1.7.2
fastapi 0.95.2
fastdatasets 0.9.7.post0
ffmpy 0.3.0
filelock 3.12.0
fire 0.5.0
flash-attn 1.0.3.post0
fonttools 4.39.4
frozenlist 1.3.3
fschat 0.2.15 /d1/data/chuxiong/FastChat
fsspec 2023.5.0
functorch 1.13.1
gensim 4.3.1
gitdb 4.0.10
GitPython 3.1.31
gmpy2 2.1.2
google-auth 2.20.0
google-auth-oauthlib 1.0.0
google-trans-new 1.1.9
gpustat 1.1
gradio 3.35.2
gradio_client 0.2.7
greenlet 2.0.2
grpcio 1.51.3
h11 0.9.0
h2 3.2.0
hjson 3.1.0
hpack 3.0.0
hstspreload 2023.1.1
httpcore 0.9.1
httpx 0.13.3
huggingface-hub 0.14.1
hydra-core 1.3.2
hyperframe 5.2.0
identify 2.5.24
idna 3.2
inquirer 3.1.3
invoke 2.1.2
ipykernel 6.15.0
ipython 8.12.0
itsdangerous 2.1.2
jaraco.functools 3.7.0
jedi 0.18.1
jieba 0.42.1
Jinja2 3.1.2
jmespath 0.10.0
joblib 1.2.0
jsonlines 3.1.0
jsonschema 4.17.3
jupyter_client 8.1.0
jupyter_core 5.3.0
kiwisolver 1.4.4
langchain 0.0.189
latex2mathml 3.76.0
lightning 2.0.4
lightning-cloud 0.5.37
lightning-utilities 0.8.0
linkify-it-py 2.0.2
lit 16.0.5.post0
loguru 0.7.0
loralib 0.1.1
Markdown 3.4.3
markdown-it-py 2.2.0
markdown2 2.4.8
MarkupSafe 2.1.1
marshmallow 3.19.0
marshmallow-enum 1.5.1
matplotlib 3.7.1
matplotlib-inline 0.1.6
mdit-py-plugins 0.3.3
mdtex2html 1.2.0
mdurl 0.1.2
mkl-fft 1.3.6
mkl-random 1.2.2
mkl-service 2.4.0
more-itertools 9.1.0
mpmath 1.2.1
msgpack 1.0.5
multidict 6.0.4
multiprocess 0.70.14
mypy-extensions 1.0.0
nest-asyncio 1.5.6
networkx 2.8.4
nh3 0.2.13
ninja 1.11.1
nltk 3.8.1
nodeenv 1.8.0
numexpr 2.8.4
numpy 1.25.0
numpy-io 0.0.3
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-ml-py 11.525.112
oauthlib 3.2.2
omegaconf 2.3.0
openai 0.27.8
openapi-schema-pydantic 1.2.4
opendelta 0.3.2
optree 0.9.1
ordered-set 4.1.0
orjson 3.9.0
oss2 2.15.0
packaging 23.0
pandas 1.5.2
paramiko 3.2.0
parso 0.8.3
pathtools 0.1.2
peft 0.4.0.dev0
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.4.0
pip 23.1.2
platformdirs 2.5.2
pre-commit 3.3.2
prompt-toolkit 3.0.36
protobuf 3.20.3
psutil 5.9.0
ptyprocess 0.7.0
pure-eval 0.2.2
py-cpuinfo 9.0.0
pyarrow 12.0.0
pyasn1 0.5.0
pyasn1-modules 0.3.0
pycparser 2.21
pycryptodome 3.18.0
pydantic 1.10.8
pydub 0.25.1
Pygments 2.15.1
PyJWT 2.7.0
PyNaCl 1.5.0
pyOpenSSL 23.0.0
pyparsing 3.0.9
pyre-extensions 0.0.23
pyrsistent 0.19.3
PySocks 1.7.1
python-dateutil 2.8.2
python-dotenv 0.19.0
python-editor 1.0.4
python-multipart 0.0.6
python-rapidjson 1.10
pytorch-lightning 2.0.4
pytz 2023.3
PyYAML 6.0
pyzmq 25.1.0
ray 2.4.0
readchar 4.0.5
regex 2023.5.5
requests 2.26.0
requests-oauthlib 1.3.1
responses 0.18.0
rfc3986 1.5.0
rich 13.4.1
rouge-chinese 1.0.3
rsa 4.9
safetensors 0.3.1
scikit-learn 1.2.2
scipy 1.10.1
seaborn 0.12.2
semantic-version 2.10.0
sentencepiece 0.1.99
sentry-sdk 1.24.0
seqmetric 0.1.2
setproctitle 1.3.2
setuptools 67.8.0
shortuuid 1.0.11
six 1.16.0
sklearn 0.0.post5
smart-open 6.3.0
smmap 5.0.0
sniffio 1.3.0
soupsieve 2.4.1
SQLAlchemy 2.0.15
sse-starlette 1.6.1
stack-data 0.2.0
starlette 0.27.0
starsessions 1.3.0
svgwrite 1.4.3
sympy 1.11.1
tabulate 0.9.0
tenacity 8.2.2
tensor-parallel 1.2.8
tensorboard 2.13.0
tensorboard-data-server 0.7.1
tensorboardX 2.6
termcolor 2.3.0
text2vec 1.2.1
tfrecords 0.2.6
threadpoolctl 3.1.0
tiktoken 0.4.0
tokenizers 0.13.3
toolz 0.12.0
torch 2.0.1
torchaudio 2.0.2
torchinfo 1.8.0
torchmetrics 0.11.4
torchtyping 0.1.4
torchvision 0.15.2
tornado 6.2
tqdm 4.65.0
traitlets 5.7.1
transformers 4.30.2
translate-json 0.0.2
triton 2.0.0
tritonclient 2.34.0
trl 0.4.4
trlx 0.6.0
typeguard 4.0.0
typing_extensions 4.6.3
typing-inspect 0.9.0
tzdata 2023.3
uc-micro-py 1.0.2
urllib3 1.25
uvicorn 0.22.0
virtualenv 20.21.0
wandb 0.15.3
wavedrom 2.0.3.post3
wcwidth 0.2.5
web.py 0.62
websocket-client 1.6.1
websockets 11.0.3
Werkzeug 2.3.6
wheel 0.38.4
xformers 0.0.16
xxhash 3.2.0
yacs 0.1.8
yarl 1.9.2
zstandard 0.21.0
I changed kwarg low_cpu_mem_usage
to False
when loading reference_model and this error just disappeared!
Sorry for the slow update on this! Glad to hear your issue was resolved. That's pretty mysterious. Have you made any other changes to the codebase? I haven't seen this issue on our end.
@skepsun just wanted to follow up- is everything working as expected for you? Feel free to re-open if you have any other questions, but I'll close this issue for now.
Met the same problem. Solved by setting low_cpu_mem_usage = False
when loading both the policy and the reference_model.
I got error:
bad value(s) in fds_to_keep
. Full logs: