PetrochukM / PyTorch-NLP

Basic Utilities for PyTorch Natural Language Processing (NLP)
https://pytorchnlp.readthedocs.io
BSD 3-Clause "New" or "Revised" License
2.21k stars 258 forks source link

Wrong number of classes is derived from `label_encoder.vocab_size` #116

Open guanqun-yang opened 3 years ago

guanqun-yang commented 3 years ago

Behaviors

The following code snippet is directly taken from README.md of the this library (see here). I am expecting the following n_class to be equal to 2 (i.e. there are only two classes [1, 2]) but 3 is returned.

import itertools

import numpy as np

from torchnlp.datasets import imdb_dataset
from torchnlp.encoders.text import WhitespaceEncoder
from torchnlp.encoders import LabelEncoder

from collections import Counter

sentence_corpus = [record["text"] for record in itertools.chain(train, test)]
label_corpus = [record["sentiment"] for record in itertools.chain(train, test)]

sentence_encoder = WhitespaceEncoder(sentence_corpus)
label_encoder = LabelEncoder(label_corpus)

for record in itertools.chain(train, test):
    record["text"] = sentence_encoder.encode(record["text"])
    record["sentiment"] = label_encoder.encode(record["sentiment"])

print(np.unique([record["sentiment"].item() for record in itertools.chain(train, test)]))
# [1 2]

vocab_size = sentence_encoder.vocab_size
n_class = label_encoder.vocab_size

print(vocab_size, n_class)
# 11402 3

Steps to Reproduce the Problem

Directly run the code snippet after pip install pytorch-nlp.

qqaatw commented 3 years ago

The one more label you pointed out is <unk>, which exists by default when you instantiate LabelEncoder.

https://github.com/PetrochukM/PyTorch-NLP/blob/d7814a297811c9b0dfb285fe0475098b86f3d348/torchnlp/encoders/label_encoder.py#L7

nitkannen commented 3 years ago

Does this issue need handling?

qqaatw commented 3 years ago

Does this issue need handling?

Hmm..It's somewhat confusing indeed, maybe we can add proper notes or comments in the documentation to make it clear.

nitkannen commented 3 years ago

Cool, any other issue that needs handling? I am looking to contribute

qqaatw commented 3 years ago

Kindly explore opened issues and find something in which you are interested to contribute :-)