NVIDIA / sentiment-discovery

Unsupervised Language Modeling at scale for robust sentiment classification
Other
1.06k stars 202 forks source link

text classification using pretrained models usage? #45

Closed harsham05 closed 5 years ago

harsham05 commented 5 years ago

I tried classifying text using both the Binary SST & Imdb pretrained models. But all 10,000 sentences/examples from my corpus were labeled -1.0 i.e negative sentiment ?

python classifier.py --load_model ~/imdb_clf.pt --test ~/sample10k.csv

My corpus looks like below

$ head -n 4 ~/sample10k.csv sentence It was for Infinity cars driving with a family nice snooth ride the XQ 60 "I like the ad, but would like to see more interior shots. Seems to me you are describing interior roominess." I love the car The poem was really sweet. I really liked the car I love this ad because it seems to talk the real life things that can happen in a car with a family.

Output

$ head sample10k.sentence.label.csv label,sentence -1.0," It was for Infinity cars driving with a family nice snooth ride the XQ 60 " -1.0," I like the ad, but would like to see more interior shots. Seems to me you are describing interior roominess. " -1.0," I love the car " -1.0," The poem was really sweet. " -1.0," I really liked the car " -1.0," I love this ad because it seems to talk the real life things that can happen in a car with a family. "

harsham05 commented 5 years ago
Python 3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 17:14:51) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> 
>>> import torch
>>> torch.__version__
'0.4.1'
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:07:04_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148
raulpuric commented 5 years ago

sample10k.sentence.label.csv is just the saved preprocessed dataset. We impute -1 into the label column if the label is missing.

Check your home directory for ~/clf_results.npy as specified in the script.

Alternatively you can supply --write_results output_csv.csv to save it in a csv.

harsham05 commented 5 years ago

Thank you @raulpuric That worked!

harsham05 commented 5 years ago

python classifier.py --load_model ~/imdb_clf.pt --data ~/sample10k.csv --write_results output_csv.csv

The predicted probabilities in the output_csv.csv make sense, but I'm guessing the imputed missing labels are copied as is to this file? Hence all the -1.0

raulpuric commented 5 years ago

yeah. It's helpful when you have a labeled test set, and want to easily check how the classifier did on your data.