Closed talbaumel closed 6 years ago
Hi Tal,
Thanks for the report.
Does fastxml run correctly when using the fxml.py
script? Also, can you paste the 'settings' file within your fastxml_model directory here so I can confirm nothing is wonky with the setting combination you used?
This is setting file
{"max_labels_per_leaf": 20, "auto_weight": 32, "bias": true, "gamma": 30, "n_labels": 2, "re_split": 0, "leaf_eps": 1e-05, "sparse_multiple": 25, "optimization": "fastxml", "n_trees": 32, "leaf_probs": false, "verbose": false, "engine": "auto", "n_epochs": 2, "loss": "log", "alpha": 0.0001, "max_leaf_size": 10, "eps": 1e-06, "C": 1, "blend": 0.8, "n_jobs": 48, "n_updates": 100.0, "seed": 2016, "leaf_classifiers": false, "subsample": 1}
It's the only file in the model folder
Do you have a sample of the data you used to train it you can share? Similarly, did fxml.py work as a baseline for success?
On Jan 21, 2018 11:21 AM, "talbaumel" notifications@github.com wrote:
This is setting file {"max_labels_per_leaf": 20, "auto_weight": 32, "bias": true, "gamma": 30, "n_labels": 2, "re_split": 0, "leaf_eps": 1e-05, "sparse_multiple": 25, "optimization": "fastxml", "n_trees": 32, "leaf_probs": false, "verbose": false, "engine": "auto", "n_epochs": 2, "loss": "log", "alpha": 0.0001, "max_leaf_size": 10, "eps": 1e-06, "C": 1, "blend": 0.8, "n_jobs": 48, "n_updates": 100.0, "seed": 2016, "leaf_classifiers": false, "subsample": 1}
It's the only file in the model folder
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/Refefer/fastxml/issues/8#issuecomment-359272925, or mute the thread https://github.com/notifications/unsubscribe-auth/ABdKFVEXmvML8Du2lQTp90n27ccmabRGks5tM448gaJpZM4Rl1AO .
It's private healthcare stuff, so I can't share the actual data :/
No worries, understand completely. According to the settings, it looks like a binary classification - is that right? Also, how many examples are in your dataset?
Oh! this is wrong, it should multi label classification The dataset contain 20,533 examples
Progress! n_labels is sniffed out when you run fit
- It assumes each y is a list of indexes, for example:
y1 = [1024, 3555]
y2 = [0, 1, 7, 5100]
y = [y1, y2]
How did you encode your Y data when passing it into the trainer?
We can also test this a couple of different ways to validate it - happy to send you some example formats and end-to-end testing if we can't make forward progress.
Thanks! it was sort of one-hot y=[0, 1, 0, 0, 1, 1]
I'll rerun it and let you know if everything is ok 👍🏻
Did that solve your problem?
I tested on the full dataset so it takes a while... started training on a small sample, to have an answer soon
Works fine
Hi, I run the following code
And got an exception:
Any idea how to solve it or how to create a classifier without saving to a file?
Thanks, Tal