VowpalWabbit / vowpal_wabbit

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
https://vowpalwabbit.org
Other
8.49k stars 1.93k forks source link

sequenceLabeler.learn producing run error #1861

Closed PramodParida closed 5 years ago

PramodParida commented 5 years ago

Hi,

I am getting this error while running the below code:

Code:

class SequenceLabeler(pyvw.SearchTask):
    def __init__(self, vw, sch, num_actions):
        # you must must must initialize the parent class
        # this will automatically store self.sch <- sch, self.vw <- vw
        pyvw.SearchTask.__init__(self, vw, sch, num_actions)

        # set whatever options you want
        sch.set_options( sch.AUTO_HAMMING_LOSS | sch.AUTO_CONDITION_FEATURES )

    def _run(self, sentence):   # it's called _run to remind you that you shouldn't call it directly!
        output = []
        for n in range(len(sentence)):
            pos,word = sentence[n]
            # use "with...as..." to guarantee that the example is finished properly
            with self.vw.example({'w': [word]}) as ex:
                pred = self.sch.predict(examples=ex, my_tag=n+1, oracle=pos, condition=[(n,'p'), (n-1, 'q')])
                output.append(pred)
        return output

vw = pyvw.vw("--search 4 --search_task hook --ring_size 1024") # 3 is the number of labels
sequenceLabeler = vw.init_search_task(SequenceLabeler)

for i in range(3):
    sequenceLabeler.learn(my_dataset)

Error:

  File "/home/pramod/anaconda3/lib/python3.6/site-packages/vowpalwabbit-8.6.1-py3.6-linux-x86_64.egg/vowpalwabbit/pyvw.py", line 24, in run
    def run(): self._output = self._run(my_example)

  File "<ipython-input-26-b725e17b2470>", line 15, in _run
    with self.vw.example({'w': [word]}) as ex:

AttributeError: __exit__

Traceback (most recent call last):

  File "<ipython-input-26-b725e17b2470>", line 26, in <module>
    sequenceLabeler.learn(my_dataset)

  File "/home/pramod/anaconda3/lib/python3.6/site-packages/vowpalwabbit-8.6.1-py3.6-linux-x86_64.egg/vowpalwabbit/pyvw.py", line 36, in learn
    self._call_vw(my_example, isTest=False);

  File "/home/pramod/anaconda3/lib/python3.6/site-packages/vowpalwabbit-8.6.1-py3.6-linux-x86_64.egg/vowpalwabbit/pyvw.py", line 31, in _call_vw
    self.vw.learn(self.bogus_example) # this will cause our ._run hook to get called

  File "/home/pramod/anaconda3/lib/python3.6/site-packages/vowpalwabbit-8.6.1-py3.6-linux-x86_64.egg/vowpalwabbit/pyvw.py", line 169, in learn
    pylibvw.vw.learn_multi(self,ec)

RuntimeError: std::exception

Please provide hints to solve this.

Thanks

jackgerrits commented 5 years ago

Are you able to provide the example from your dataset so I can repro the issue?

PramodParida commented 5 years ago

I am using the dataset atis and have formated it to the type vw. But facing this issue with pyvw library it is breaking as posted above. I have facing this issue both in windows and linux.

Thanks

lokitoth commented 5 years ago

@PramodParida: Are you trying to learn audio => transcription for ATIS, or something else?

Also, did you build the VW library you are using, or did you get a pre-built one?

PramodParida commented 5 years ago

@lokitoth: NO it's tagged text data. I have vowpalwabbit running in command line mode but facing this issue while using the python wrapper.

arielf commented 5 years ago

@PramodParida

@lokitoth question is important.

"did you build the VW library you are using, or did you get a pre-built one?"

Generally, any precompiled library is unlikely to be compatible with anaconda, which has it own incompatible tool-chain (incl compilers)

PramodParida commented 5 years ago

The command line mode works fine.

Yes, I have build the library. But python interface is not working.

Please resolve the python issue.

Thanks

lokitoth commented 5 years ago

The issue here is due to changes to how example disposal works in the Python bindings. We mistakenly kept enter when removing exit in #1837.

PR #2176. removes the enter call and updates the example scripts to properly dispose the generated examples (without with, as that is no longer supported.) Unfortunately, it seems that there was no good way to implement finish_example without the breaking change.

andy-soft commented 5 years ago

I have the same problem, and cannot continue , want to test to POS tagger and the NER labeler, and both fail at the exact same point! read the code but found no way to solve it! please help! C# wrapper (.net 4.6+) don't work either for POS labeling keept throwing error on IntPtr! on the source the POS tagger has been removed as well as the NER splitter ¿why?

lokitoth commented 5 years ago

Hi @andy-soft: Could you elaborate what you mean by "on the source the POS tagger has been removed as well as the NER splitter"?

Take a look at the PR referenced above to see how the examples change to deal with the issue mentioned here.

With that said, there are additional issues in more complex LDF-based tasks that I do not yet have a handle on.

I have been focusing on Python right now, but will look at the C# bindings next.

In particular, the issue is here:

with self.vw.example({'w': [word]}) as ex:
   # ... code here

This needs to be replaced with:


ex = self.vw.example({'w': [word]})
# ... code here #make sure to remove the indent
self.vw.finish_example([ex]) # In search, need to pass examples into finish_example as a list.
lokitoth commented 5 years ago

The samples have been updated with the correct code to use in this case with #2176. LDF issues with Covington DEP Parser, and Word Alignment are tracked by #2175.

andy-soft commented 5 years ago

Hi @andy-soft: Could you elaborate what you mean by "on the source the POS tagger has been removed as well as the NER splitter"?

Take a look at the PR referenced above to see how the examples change to deal with the issue mentioned here.

With that said, there are additional issues in more complex LDF-based tasks that I do not yet have a handle on.

I have been focusing on Python right now, but will look at the C# bindings next.

In particular, the issue is here:

with self.vw.example({'w': [word]}) as ex:
   # ... code here

This needs to be replaced with:

ex = self.vw.example({'w': [word]})
# ... code here #make sure to remove the indent
self.vw.finish_example([ex]) # In search, need to pass examples into finish_example as a list.

Hi, thanks for the reply

AS I saw the "Oracle" for a sequence detector/classifier, needs to be built externally, as the Python interface failed, (just because of all the previous blah blah) I tried to test the system with a Spanish complicated POS tagging and NER detection, and never found the documentation on how to link the code with external C# code, I tried to deduct the usage reading the "unit testing" on the distributed code, but this never compiled on my computer (using W10 VS2019 and all the C++ SDK % pkg installed correctly) I only got to work the version 8.60 retrieved by Nuget pkg mgr. The python interface only runs under the Ubuntu-internal windows subsystem, I never got to build it on windows, each time I tried to install it (batch, etc.) it throws strange and unrecognizable errors. But under python 3.6 the "pyvw" didn't even work well still (I have to test the upgrade yet)

Under C# I found no documentation about the interface or API, so I had to test all by myself, and try-error is just too hard to get a result. I wanted to use the VW for many internal NLP tasks, as I deducted it is able o handle the high-dimensionality of a highly inflected language like Spanish.

Many of the "testing apps, and procedures, simply do not compile nor run with the NuGet-8.6 pkg I know that this is all part of a work-in-progress but most of the tutorials are old and you cannot reproduce them in any way, AS an example the POS tagger and NER detector were mysteriously removed and the link is unexistent.

I am a skilled C# programmer (but not in C++) and did not find out how to use the C# to C++interface. Maybe because I got no success in understanding the "internals", on many operations like multiclass classification I got it trained, but couldn't get the predictions "out of the VW" just because I found no way to do it. (I even tried the structured parameters).. no luck!

I want to make complex predictions involving several parallel selections, like doing POS tagging + Semantic + Grammatical Parsing, all in one step, using "contextual Bandit" mode, I make it using a modified-by-me HMM tagger who does all the things at once! (I guess WV will do this better)

I also will build on VW a NER detector + classifier for a generic noun-phrase, using Spanish and a previously processed corpus. I am building NLP systems for 15 years and I was seduced with the promise of VW outperforming CRF++ and other CNN-LSTM using embeddings, I am working on a sequence labeler to "understand" the structure of discourse in Spanish, with several spell-errors, not-an-easy-task, indeed!

I succeeded in doing joint predictions and by correcting spelling and making POS tagging at once! (on my system) I guess VW will do this far better!

If you could provide me with a simple "working" C# sample of the oracle, I can start from this on!

Thanks anyway!

andy-soft commented 4 years ago

Hi there, the repairing-code worked (Haleluyah) on training, but on the "prediction" it still gives strange errors!


AssertionError Traceback (most recent call last) ~/.local/lib/python3.6/site-packages/vowpalwabbit/pyvw.py in run() 22 self._output = None 23 self.bogus_example[0].set_test_only(isTest) ---> 24 def run(): self._output = self._run(my_example) 25 setup = None 26 takedown = None

in _run(self, sentence) 14 # use "with...as..." to guarantee that the example is finished properly 15 ex = self.vw.example({'w': [word]}) ---> 16 pred = self.sch.predict(examples=ex, my_tag=n+1, oracle=pos, condition=[(n,'p'), (n-1, 'q')]) 17 output.append(pred) 18 return output ~/.local/lib/python3.6/site-packages/vowpalwabbit/pyvw.py in predict(examples, my_tag, oracle, condition, allowed, learner_id) 310 P.set_oracles(oracle) 311 elif isinstance(oracle, int): --> 312 assert oracle > 0, 'multiclass labels are from 1..., please do not use zero or bad things will happen!' 313 P.set_oracle(oracle) 314 else: AssertionError: multiclass labels are from 1..., please do not use zero or bad things will happen!