apple / turicreate

Turi Create simplifies the development of custom machine learning models.
BSD 3-Clause "New" or "Revised" License
11.2k stars 1.14k forks source link

Error calling nearest_neighbor_model.query within call to apply #3202

Open ryanmcfall opened 4 years ago

ryanmcfall commented 4 years ago

I am trying to use the apply method to add a new column to an SFrame, containing the result of the prediction created by a nearest_neighbor classifier.

Here's the code

import turicreate

papers = turicreate.SFrame.read_csv('sample-reviews.txt
[sample-reviews.txt](https://github.com/apple/turicreate/files/4652759/sample-reviews.txt)
', delimiter='\t')
papers['wordcount'] = turicreate.text_analytics.count_words(papers['Review'])
papers['tf_idf'] = turicreate.text_analytics.tf_idf(papers['Review'])

classifier = turicreate.nearest_neighbors.create(papers, features=['tf_idf'], distance='cosine', label='Filename')

test_set = papers[papers['Filename'] == 'TW1.pdf']

def classify (review):        
    frame = turicreate.SFrame([review]).unpack()    
    prediction = classifier.query(frame, verbose=False)
    return prediction[1]['reference_label']

#  Test out the classify function
classify(test_set[0])

#  Use classify to add a new column to the test_set SFrame
test_set['new'] = test_set.apply(classify)

The call to the classify function gives the expected output, a string containing the file name of the predicted most similar paper. But, calling the apply method at the bottom results in an error message which boils down to:

PicklingError: Cannot pickle objects of type <class 'turicreate._cython.cy_model.UnityModel'>

Strangely, this error appears to be a side-effect of calling classifier.query. If I change the classify function to be:

def classify (review):        
    frame = turicreate.SFrame([review]).unpack()    
    #prediction = classifier.query(frame, verbose=False)
    return 'test'

I don't get the error, but uncommenting the call to classifier.query in the above version does cause the error, even though the result of that call is not even used.

I tried uploading a sample data file, but I don't see anything happening when I upload. But I tried just a small sample tab-delimited file with string columns Filename, Book, and Review and the issue occurred with both that file and the actual data file that I used.

TobyRoseman commented 4 years ago

This is related to #2776. It seems most of our models can not be used from inside of an SFrame.apply.

@ryanmcfall - Instead of doing: test_set['new'] = test_set.apply(classify) I would call: temp = classifier.query(test_set, verbose=False) Then do a filter_by or groupby on temp to get what you want. I'm not sure what you're trying to do here, if you let me know I can try to help you.