No such file or directory: '/tmp/fmbench-read/tokenizer'

athewsey commented 4 months ago

I've been trying to run FMBench with self-contained setup as described on the README to test whether Python 3.10 can be supported as per #94

(But one caveat that I'm running on a SageMaker Notebook Instance rather than plain EC2)

Setting up conda & Poetry, then running ./copy_s3_content.sh and ./debug.sh, my debug script fails in notebook 0 with error:

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/fmbench-read/tokenizer'

Sure enough, if I ls /tmp/fmbench-read I see folders for configs, prompt_template, scripts, and source_data - but no tokenizers... What's the expected way to set up the local tokenizers folder, and could we get it automated within ./copy_s3_content.sh?

athewsey commented 4 months ago

It seems like copying the contents of src/fmbench/tokenizer into the target tmp folder might be sufficient? But I'm not sure if the copy_s3_content.sh script is expected to be used in any contexts other than a git clone of this whole repo?

e.g. maybe we could add something like the following to the script?:

cp -R src/fmbench/tokenizer ${FMBENCH_READ_DIR}/tokenizer

...Or maybe these files are already hosted on public S3 and should be added to the manifest.txt?

aarora79 commented 3 months ago

Fixed as of v1.0.45.

aws-samples / foundation-model-benchmarking-tool

No such file or directory: '/tmp/fmbench-read/tokenizer' #102