bazingagin / npc_gzip

Code for Paper: “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors
MIT License
1.77k stars 156 forks source link

How to load a custom dataset #15

Closed StudyingLover closed 1 year ago

StudyingLover commented 1 year ago

I made a set of my own data set, but I found that your code uses torchtext to load data, how to load my custom dataset

bazingagin commented 1 year ago

I've added the function and doc in the readme: https://github.com/bazingagin/npc_gzip#use-custom-dataset. Hope that helps.

StudyingLover commented 1 year ago

I think I need some help :tired_face: image This is my dir tree, data/custom is my custom data,then I run command

python main_text.py --data_dir data  --dataset custom     

I got a miskate

Traceback (most recent call last):
  File "/home/npc_gzip/main_text.py", line 162, in <module>
    dataset_pair = eval(args.dataset)(root=args.data_dir)
  File "<string>", line 1, in <module>
NameError: name 'custom' is not defined

I wonder how to use

StudyingLover commented 1 year ago

Besides, I notice there are some mistakes caused by annotation ,https://github.com/bazingagin/npc_gzip/pull/22 ,I have fix them.

bazingagin commented 1 year ago

It should be fixed now. You can pass "custom" to "--dataset" with additional "--class_num" indicating the number of classes. Also according to your structure I think you should pass "data/custom" to "--data_dir".

StudyingLover commented 1 year ago

Oooooo! Thank you! That's a amazing work I think and I have run my custom data successfully !