Closed Casperfrc closed 4 years ago
I think you got a little mixed up with the files you're passing VW.
parse_data.py
converts the original format to VW text cost sensitive examples.
python3 parse_data.py <input_file> <output_file>
So you probably want to run:
python3 parse_data.py custom_training.txt vw_custom_training.txt
python3 parse_data.py custom_test.txt vw_custom_test.txt
vw --passes 3 -d vw_custom_training.txt -k -c --search_rollin mix_per_roll --search_task dep_parser --search 12 --search_alpha 1e-5 --search_rollout oracle --holdout_off -f model.vw --search_history_length 3 --search_no_caching -b 30 --root_label 8 --num_label 12 --nn 5 --ftrl
When I run this I get the following error:
terminate called after throwing an instance of 'VW::vw_exception'
what(): invalid label 13 which is > num actions=12
fish: 'vw --passes 3 -d train_data.txt…' terminated by signal SIGABRT (Abort)
I am not familiar with the dependency parsing scenario, but the following is the multi example that is causing the error. There is likely an issue with the way the original dataset was formed, but I do not know enough about the dependency parsing scenario to be able to say without researching deeper.
4 9 4:aux|w do |p vbp
3 7 3:compound|w museum |p nn
4 10 4:nsubj|w labels |p nns
0 8 0:root|w have |p vb
6 12 6:det|w an |p dt
4 7 4:obj|w impact |p nn
8 4 8:case|w on |p in
6 2 6:nmod|w how |p wrb
10 10 10:nsubj|w people |p nns
8 13 8:acl:relcl|w look |p vbp <--- This is the troublesome example
12 4 12:case|w at |p in
10 11 10:obl|w artworks |p nns
4 3 4:punct|w ? |p .
Hope this helps! Let me know if you have more questions
Hey Jack,
This really helped, I realised what the issue was after running it exactly like you told me. (Furthermore, I realised I had broken the Makefile in some way, so I re-downloaded that, whoops.)
The issue I had was simply the fact that I had more labels than the command in the Makefile was defining. It was defining 12, I had 34.
Thanks a lot for the help! Really appreciate what you guys are working on here.
- Casper
Glad I could help you out @Casperfrc! Don't hesitate to reach out if you face issues.
Describe the bug
The demo is in the following directory: _vowpalwabbit/demo/dependencyparsing/
I am looking into using different datasets on the demo than the standard of wsj_train_subset and wsj_test_subset. I have created different testfiles based on some other data I found for dependency parsing but even after formatting the data seemingly fitting for the demo it won't entirely parse it.
I created the following files based on actual data: custom_test.txt custom_training.txt
I am aware the spacing is not exactly the same as the demo's data, but gathered it made no difference. I did although try to create a very little dataset that gets the same error: small_test.txt
Following the steps of the Makefile, I have isolated the issue. I manage to parse the training data and the test data with parse_data.py but when I reach the dep.model part of the Makefile it halts and prints the following:
final_regressor = model.vw Enabling FTRL based optimization Algorithm used: Proximal-FTRL ftrl_alpha = 0.005 ftrl_beta = 0.1 Num weight bits = 30 learning rate = 0.5 initial_t = 0 power_t = 0.5 decay_learning_rate = 1 creating cache_file = /home/casperfrc/projects/bachelor_contextual_bandit/data/dependency_parsing/custom_training.cache Reading datafile = /home/casperfrc/projects/bachelor_contextual_bandit/data/dependency_parsing/custom_training num sources = 1 vw example #5(cost_sensitive.cc:179): invalid cost: specification -- no names on: :
To Reproduce
Steps to reproduce the behavior:
git clone https://github.com/VowpalWabbit/vowpal_wabbit.git
cd vowpal_wabbit/demo/dependencyparsing
python3 parse_data.py custom_training model.vw
python3 parse_data.py custom_test model_tested.vw
vw --passes 3 -d custom_training -k -c --search_rollin mix_per_roll --search_task dep_parser --search 12 --search_alpha 1e-5 --search_rollout oracle --holdout_off -f tested_model.vw --search_history_length 3 --search_no_caching -b 30 --root_label 8 --num_label 12 --nn 5 --ftrl
Expected behavior
I was expecting a model to be created.
Observed Behavior
As mentioned earlier, the error message is:
When just running make dep.perf, the error is instead:
This really made me question my data files, but I just can't find the issue.
Environment
What version of VW did you use? 8.5.0
What OS or language did you use? I'm on Ubuntu 18.04
Additional context