More than two labels. - Githubissues

johnp2266 commented 6 years ago

Hi,

I'm new to this. I have 7 different types of data for my dataset. I'm not sure what to change in eval.py to load my dataset?

jannenev commented 6 years ago

Take a look at the file data_helpers.py There is a method: load_data_and_labels(positive_data_file, negative_data_file), which returns the data and labels.

You can add a new method, where you read in your own data. Then turn your labels into 1-hot encoded form and return it. In eval.py change code to call your own new method.

Something like

 def load_multilabel_data_and_labels():
    # add here code to read in your own data

    # Turn your labels into 1-hot-encoded form
    # Here 5 classes, so we return a an integer vector length 5. [1,0,0,0,0]
    # where only one of them is 1, rest 0. 
    # Meaning [class1=1, class2=0, class3=0...]

    nr_classes = enumerate(labels)
    nr_lines = len(labels)    
    new_labels = np.zeros(nr_lines,nr_classes)) 

    for labelnr, value in enumerate(labels):
        if value[0]==1:
            new_labels[labelnr][0]=1  #one hot to true

        elif value[0]==0.7:
            new_labels[labelnr][1]=1  

        elif value[0]==0.5:
            new_labels[labelnr][2]=1  

        elif value[1]==0.7:
            new_labels[labelnr][3]=1  

        elif value[0]==0:
            new_labels[labelnr][4]=1  

    x_text = new_texts
    y = new_labels
    return [x_text, y]

In eval.py, find this and replace with your new method:

x_text, y = data_helpers.load_data_and_labels(FLAGS.positive_data_file, FLAGS.negative_data_file)

applepieiris commented 3 years ago

Is this enough ? Do I need to change the definition of the model?

dennybritz / cnn-text-classification-tf

More than two labels. #157