Refefer / fastxml

FastXML / PFastXML / PFastreXML - Implementation of Extreme Multi-label Classification
Other
149 stars 47 forks source link

Error while loading model #8

Closed talbaumel closed 6 years ago

talbaumel commented 6 years ago

Hi, I run the following code

path = 'fastxml_model'
trainer.save(path)
clf = Inferencer(path)
pred = clf.predict(test_set[0])

And got an exception:

FileNotFoundError                         Traceback (most recent call last)
FileNotFoundError: [Errno 2] No such file or directory: 'fastxml_model/tree.0.weights'

Exception ignored in: 'fastxml.inferencer.load_sparse'
FileNotFoundError: [Errno 2] No such file or directory: 'fastxml_model/tree.0.weights'
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-43-0624c4270f20> in <module>()
      1 path = 'fastxml_model'
      2 trainer.save(path)
----> 3 clf = Inferencer(path)
      4 #pred = clf.predict(test_set[0])

/usr/local/lib/python3.5/dist-packages/fastxml-2.0.0-py3.5-linux-x86_64.egg/fastxml/fastxml.py in __init__(self, dname, gamma, blend, leaf_probs)
     21         self.leaf_probs = leaf_probs
     22 
---> 23         forest = IForest(dname, self.n_trees, self.n_labels)
     24         if self.leaf_classifiers:
     25             lc = LeafComputer(dname)

fastxml/inferencer.pyx in fastxml.inferencer.IForest.__init__()

fastxml/inferencer.pyx in fastxml.inferencer.ITree.__init__()

fastxml/inferencer.pyx in fastxml.inferencer.load_dense_f32()

FileNotFoundError: [Errno 2] No such file or directory: 'fastxml_model/tree.0.bias'

Any idea how to solve it or how to create a classifier without saving to a file?

Thanks, Tal

Refefer commented 6 years ago

Hi Tal,

Thanks for the report.

Does fastxml run correctly when using the fxml.py script? Also, can you paste the 'settings' file within your fastxml_model directory here so I can confirm nothing is wonky with the setting combination you used?

talbaumel commented 6 years ago

This is setting file {"max_labels_per_leaf": 20, "auto_weight": 32, "bias": true, "gamma": 30, "n_labels": 2, "re_split": 0, "leaf_eps": 1e-05, "sparse_multiple": 25, "optimization": "fastxml", "n_trees": 32, "leaf_probs": false, "verbose": false, "engine": "auto", "n_epochs": 2, "loss": "log", "alpha": 0.0001, "max_leaf_size": 10, "eps": 1e-06, "C": 1, "blend": 0.8, "n_jobs": 48, "n_updates": 100.0, "seed": 2016, "leaf_classifiers": false, "subsample": 1}

It's the only file in the model folder

Refefer commented 6 years ago

Do you have a sample of the data you used to train it you can share? Similarly, did fxml.py work as a baseline for success?

On Jan 21, 2018 11:21 AM, "talbaumel" notifications@github.com wrote:

This is setting file {"max_labels_per_leaf": 20, "auto_weight": 32, "bias": true, "gamma": 30, "n_labels": 2, "re_split": 0, "leaf_eps": 1e-05, "sparse_multiple": 25, "optimization": "fastxml", "n_trees": 32, "leaf_probs": false, "verbose": false, "engine": "auto", "n_epochs": 2, "loss": "log", "alpha": 0.0001, "max_leaf_size": 10, "eps": 1e-06, "C": 1, "blend": 0.8, "n_jobs": 48, "n_updates": 100.0, "seed": 2016, "leaf_classifiers": false, "subsample": 1}

It's the only file in the model folder

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/Refefer/fastxml/issues/8#issuecomment-359272925, or mute the thread https://github.com/notifications/unsubscribe-auth/ABdKFVEXmvML8Du2lQTp90n27ccmabRGks5tM448gaJpZM4Rl1AO .

talbaumel commented 6 years ago

It's private healthcare stuff, so I can't share the actual data :/

Refefer commented 6 years ago

No worries, understand completely. According to the settings, it looks like a binary classification - is that right? Also, how many examples are in your dataset?

talbaumel commented 6 years ago

Oh! this is wrong, it should multi label classification The dataset contain 20,533 examples

Refefer commented 6 years ago

Progress! n_labels is sniffed out when you run fit - It assumes each y is a list of indexes, for example:


y1 = [1024, 3555]
y2 = [0, 1, 7, 5100]

y = [y1, y2]

How did you encode your Y data when passing it into the trainer?

We can also test this a couple of different ways to validate it - happy to send you some example formats and end-to-end testing if we can't make forward progress.

talbaumel commented 6 years ago

Thanks! it was sort of one-hot y=[0, 1, 0, 0, 1, 1] I'll rerun it and let you know if everything is ok 👍🏻

Refefer commented 6 years ago

Did that solve your problem?

talbaumel commented 6 years ago

I tested on the full dataset so it takes a while... started training on a small sample, to have an answer soon

talbaumel commented 6 years ago

Works fine