ThilinaRajapakse / pytorch-transformers-classification

Based on the Pytorch-Transformers library by HuggingFace. To be used as a starting point for employing Transformer models in text classification tasks. Contains code to easily train BERT, XLNet, RoBERTa, and XLM models for text classification.
Apache License 2.0
304 stars 97 forks source link

Regarding number of train test samples #33

Open Sreelakshmi-k opened 4 years ago

Sreelakshmi-k commented 4 years ago

INFO:main:Creating features from dataset file at data/ 8000 817 100%|██████████| 817/817 [00:01<00:00, 537.09it/s] INFO:main:Saving features into cached file data/cached_train_bert-base-multilingual-cased_128_binary INFO:main: Running training INFO:main: Num examples = 817 INFO:main: Num Epochs = 35 INFO:main: Total train batch size = 8 INFO:main: Gradient Accumulation steps = 1 INFO:main: Total optimization steps = 3605

INFO:main:Evaluate the following checkpoints: ['outputs/checkpoint-2000', 'outputs'] INFO:main:Creating features from dataset file at data/ 2000 18 100%|██████████| 18/18 [00:00<00:00, 148.27it/s] INFO:main:Saving features into cached file data/cached_dev_bert-base-multilingual-cased_128_binary INFO:main: Running evaluation 2000 INFO:main: Num examples = 18 INFO:main: Batch size = 8 Evaluating 100% 3/3 [00:00<00:00, 7.02it/s] INFO:main: Eval results 2000 INFO:main: fn = 4 INFO:main: fp = 3 INFO:main: mcc = 0.20385887657505022 INFO:main: tn = 7 INFO:main: tp = 4

INFO:main:Loading features from cached file data/cached_dev_bert-base-multilingual-cased_128_binary INFO:main: Running evaluation outputs INFO:main: Num examples = 18 INFO:main: Batch size = 8 Evaluating 100% 3/3 [00:00<00:00, 7.51it/s]

INFO:main: Eval results outputs INFO:main: fn = 4 INFO:main: fp = 2 INFO:main: mcc = 0.31622776601683794 INFO:main: tn = 8 INFO:main: tp = 4

**

**

ThilinaRajapakse commented 4 years ago

The data might be being loaded from the cache dir. Try deleting any cached files.

Do you have the same issue when using the yelp data?

Sreelakshmi-k commented 4 years ago

I haven't tried the any other data

On Wed, 13 Nov, 2019, 4:19 PM Thilina Rajapakse, notifications@github.com wrote:

The data might be being loaded from the cache dir. Try deleting any cached files.

Do you have the same issue when using the yelp data?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ThilinaRajapakse/pytorch-transformers-classification/issues/33?email_source=notifications&email_token=AJWZDRYME5LL33VY3MV72QTQTPLR5A5CNFSM4JMQ2MUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED5W2QQ#issuecomment-553348418, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWZDR6ZBXEJZDSVVMB5OCLQTPLR5ANCNFSM4JMQ2MUA .

Sreelakshmi-k commented 4 years ago

i have provided the right location from where my data has to be take. I have kept both test and train files in a folder called data which is in my drive. Could u please tell me what i should do to train the whole 8000 data and test whole 2000 Regards, Sreelakshmi

On Wed, Nov 13, 2019 at 4:41 PM sreelakshmi nair ammaslakshmy@gmail.com wrote:

I haven't tried the any other data

On Wed, 13 Nov, 2019, 4:19 PM Thilina Rajapakse, notifications@github.com wrote:

The data might be being loaded from the cache dir. Try deleting any cached files.

Do you have the same issue when using the yelp data?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ThilinaRajapakse/pytorch-transformers-classification/issues/33?email_source=notifications&email_token=AJWZDRYME5LL33VY3MV72QTQTPLR5A5CNFSM4JMQ2MUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED5W2QQ#issuecomment-553348418, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWZDR6ZBXEJZDSVVMB5OCLQTPLR5ANCNFSM4JMQ2MUA .

ThilinaRajapakse commented 4 years ago

Try using the Yelp dataset as given in the guide. It's impossible to say what the issue is without seeing your data.

Or, consider using Simple Transformers as it is up to date and much easier to use.

Sreelakshmi-k commented 4 years ago

I saw that you yelp data had a column of texts and a column of labels. My data is also the same. I will try simple transformer

On Wed, 13 Nov, 2019, 6:39 PM Thilina Rajapakse, notifications@github.com wrote:

Try using the Yelp dataset as given in the guide. It's impossible to say what the issue is without seeing your data.

Or, consider using Simple Transformers https://github.com/ThilinaRajapakse/simpletransformers as it is up to date and much easier to use.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ThilinaRajapakse/pytorch-transformers-classification/issues/33?email_source=notifications&email_token=AJWZDRYSTN5CS4M5ELKFB2DQTP4CHA5CNFSM4JMQ2MUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED6CX4Y#issuecomment-553397235, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWZDRYKBYOH7C4VSJHW4BLQTP4CHANCNFSM4JMQ2MUA .