VE-FORBRYDERNE / mtj-softtuner

Create soft prompts for fairseq 13B dense, GPT-J-6B and GPT-Neo-2.7B for free in a Google Colab TPU instance
https://henk.tech/softtuner/
Apache License 2.0
27 stars 20 forks source link

Issue with trainer.tokenize_dataset #14

Open Psmallwood217 opened 2 years ago

Psmallwood217 commented 2 years ago

When I try to use this notebook in its google colab implementation I am able to run it down to where you make the npy file. However, when I try to run that block I get the following error. Any thoughts on why, or what my problem is?


TypeError Traceback (most recent call last) in 3 batch_size = 2048 # @param {type:"integer"} 4 epochs = 1 # @param {type:"integer"} ----> 5 trainer.tokenize_dataset(dataset_path, output_file, batch_size, epochs) 6 trainer.save_data() 7 print("OK.")

4 frames /usr/lib/python3.7/posixpath.py in join(a, *p) 78 will be discarded. An empty last part will result in a path that 79 ends with a separator.""" ---> 80 a = os.fspath(a) 81 sep = _get_sep(a) 82 path = a

TypeError: expected str, bytes or os.PathLike object, not NoneType

vfbd commented 1 year ago

There's another notebook at https://henk.tech/softtuner/. Let me know if that one works for you.