apple / turicreate

Turi Create simplifies the development of custom machine learning models.
BSD 3-Clause "New" or "Revised" License
11.2k stars 1.14k forks source link

Text classifier predictions on Xcode #174

Closed emannuelOC closed 6 years ago

emannuelOC commented 6 years ago

After importing the model created with the sentence_classifier into Xcode, I got the following error:

Error Domain=com.apple.CoreML Code=1 "Predicted feature named 'rating' was not output by pipeline"

rating was the name of the target variable in my dataset.

Here's the code I used to create the model:

import turicreate as tc data = tc.SFrame('sentiment-train.tsv') model = tc.sentence_classifier.create(data, 'rating', features=['text']) model.export_coreml('MyModel.mlmodel')

p.s. I couldn't build turicreate locally on my machine so I've updated the code in the _sentence_classifier.py according to the one in #58. I know that's not the ideal and it may be causing the problem but it was the only way I could try exporting the model.

srikris commented 6 years ago

I'd recommend building on your machine so its easier for us to triage and identify the issue. Otherwise, it could be a symptom of your system.

What trouble did you have building? Were you able to follow the instructions on BUILD.md, or was there something missing there?

emannuelOC commented 6 years ago

I got the following erros when running make -j 8:

Command /bin/sh failed with exit code 127

** BUILD FAILED **

The following build commands failed:
    PhaseScriptExecution Run\ Script /Users/emannuelcarvalho/Developer/turicreate/debug/src/visualization/Turi\ Create\ Visualization.build/Debug/Turi\ Create\ Visualization.build/Script-FC1A878A1FB14D2900A67DAD.sh
(1 failure)
make[2]: *** [src/visualization/CMakeFiles/visualization_client] Error 65
make[1]: *** [src/visualization/CMakeFiles/visualization_client.dir/all] Error 2

[UPDATE]

After taking a look at this issue I installed node in my Mac and the build worked 🎉

emannuelOC commented 6 years ago

After building, I was able to export the model to the .mlmodel format and use it in Xcode but the problem persists.

Here's the code I used to create and export the model:

>>> import turicreate as tc
>>> data = tc.SFrame('sent_train.csv')
Finished parsing file /Users/emannuelcarvalho/Developer/sent_train.csv
Parsing completed. Parsed 100 lines in 0.053013 secs.
------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
Finished parsing file /Users/emannuelcarvalho/Developer/sent_train.csv
Parsing completed. Parsed 7086 lines in 0.021544 secs.
>>> model = tc.sentence_classifier.create(data, 'sentiment', features=['text'])
PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.

Logistic regression:
--------------------------------------------------------
Number of examples          : 6745
Number of classes           : 2
Number of feature columns   : 1
Number of unpacked features : 2952
Number of coefficients    : 2953
Starting L-BFGS
--------------------------------------------------------
+-----------+----------+-----------+--------------+-------------------+---------------------+
| Iteration | Passes   | Step size | Elapsed Time | Training-accuracy | Validation-accuracy |
+-----------+----------+-----------+--------------+-------------------+---------------------+
| 1         | 3        | 0.000148  | 1.281354     | 0.987250          | 0.950147            |
| 2         | 5        | 1.000000  | 1.536147     | 0.996145          | 0.961877            |
| 3         | 6        | 1.000000  | 1.681825     | 0.997924          | 0.961877            |
| 4         | 7        | 1.000000  | 1.856880     | 0.999259          | 0.976540            |
| 5         | 8        | 1.000000  | 2.030388     | 0.999703          | 0.976540            |
| 6         | 9        | 1.000000  | 2.191586     | 0.999852          | 0.976540            |
| 10        | 13       | 1.000000  | 2.872957     | 1.000000          | 0.976540            |
+-----------+----------+-----------+--------------+-------------------+---------------------+
TERMINATED: Iteration limit reached.
This model may not be optimal. To improve it, consider increasing `max_iterations`.
>>> model.export_coreml('ClassifierModel.mlmodel')
Saving valid model to path ClassifierModel.mlmodel

And here's the swift code:

    let bagOfWords = bow(text: "I think that was awesome!")
        do {
            let prediction = try ClassifierModel().prediction(text: bagOfWords)
            print(prediction)
        } catch {
            print(error)
        }

I got the bagOfWords code from the text_classifier guide:

    func bow(text: String) -> [String: Double] {
        var bagOfWords = [String: Double]()

        let tagger = NSLinguisticTagger(tagSchemes: [.tokenType], options: 0)
        let range = NSRange(location: 0, length: text.utf16.count)
        let options: NSLinguisticTagger.Options = [.omitPunctuation, .omitWhitespace]
        tagger.string = text

        tagger.enumerateTags(in: range, unit: .word, scheme: .tokenType, options: options) { _, tokenRange, _ in
            let word = (text as NSString).substring(with: tokenRange)
            if bagOfWords[word] != nil {
                bagOfWords[word]! += 1
            } else {
                bagOfWords[word] = 1
            }
        }

        return bagOfWords
    }

It output the error bellow:

Error Domain=com.apple.CoreML Code=1 "Predicted feature named 'sentiment' was not output by pipeline" UserInfo={NSLocalizedDescription=Predicted feature named 'sentiment' was not output by pipeline}

Maybe there's a problem with my dataset?

I uploaded the csv file here in case you wanna reproduce exactly the same situation.

znation commented 6 years ago

Thanks @emannuelOC - I'll investigate from here.

srikris commented 6 years ago

@emannuelOC Thanks for catching this issue. The issue is related to #98 and we will continue to investigate. It seems that this happens when you have a new word during prediction time. We have narrowed down this issue and hope to get a fix out soon.

znation commented 6 years ago

Dupe of #98 (it's the same issue). @srikris, please verify both when closing #98.