Closed athewsey closed 3 months ago
It seems like copying the contents of src/fmbench/tokenizer into the target tmp
folder might be sufficient? But I'm not sure if the copy_s3_content.sh
script is expected to be used in any contexts other than a git clone of this whole repo?
e.g. maybe we could add something like the following to the script?:
cp -R src/fmbench/tokenizer ${FMBENCH_READ_DIR}/tokenizer
...Or maybe these files are already hosted on public S3 and should be added to the manifest.txt
?
Fixed as of v1.0.45.
I've been trying to run FMBench with self-contained setup as described on the README to test whether Python 3.10 can be supported as per #94
(But one caveat that I'm running on a SageMaker Notebook Instance rather than plain EC2)
Setting up conda & Poetry, then running
./copy_s3_content.sh
and./debug.sh
, my debug script fails in notebook 0 with error:Sure enough, if I
ls /tmp/fmbench-read
I see folders forconfigs
,prompt_template
,scripts
, andsource_data
- but no tokenizers... What's the expected way to set up the local tokenizers folder, and could we get it automated within./copy_s3_content.sh
?