NER prediction - Githubissues

harpap commented 2 years ago

Hello again, I am trying to run a prediction. Let's say I have a txt file with a paragraph that I want to annotate with entities (B-LOC, I-LOC etc), how can I do this? I have already set up your pretrained model and I succesfully run the test command: CUDA_VISIBLE_DEVICES=0 python train.py --config config/conll_03_english.yaml --test I am a little lost with the documentation. I appreciate any help. Thanks in advance!

wangxinyu0922 commented 2 years ago

Hi,

See this part and this issue for the instructions of predicting.

harpap commented 2 years ago

thank you for the reply. I still don't understand. I made a file called train.washington in a directory called 'toAnnotate' which contains some sentenses:

If you had to sum up George Washington's life in one word, that word would have to be unforgettable.
George's story is one of travel and adventure, full of risks and, most of all, full of glory.
After all, in 1789, he was elected the first president of the United States, a country that was to become the most powerful in the world.
At the end of his life, in 1799, George was an international hero.

if I run: CUDA_VISIBLE_DEVICES=0 python train.py --config config/conll_03_english.yaml --parse --target_dir 'toAnnotate/' --keep_order I get an error RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 7.93 GiB total capacity; 7.33 GiB already allocated; 6.12 MiB free; 88.86 MiB cached) I want to have a program that with some senteses as input to get them annotated. Sorry, if I am missing something.

wangxinyu0922 commented 2 years ago

thank you for the reply. I still don't understand. I made a file called train.washington in a directory called 'toAnnotate' which contains some sentenses:
If you had to sum up George Washington's life in one word, that word would have to be unforgettable.
George's story is one of travel and adventure, full of risks and, most of all, full of glory.
After all, in 1789, he was elected the first president of the United States, a country that was to become the most powerful in the world.
At the end of his life, in 1799, George was an international hero.
if I run: CUDA_VISIBLE_DEVICES=0 python train.py --config config/conll_03_english.yaml --parse --target_dir 'toAnnotate/' --keep_order I get an error RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 7.93 GiB total capacity; 7.33 GiB already allocated; 6.12 MiB free; 88.86 MiB cached) I want to have a program that with some senteses as input to get them annotated. Sorry, if I am missing something.

You need to tokenize the sentence to form a conll format at first, then use 'O' as a dummy tag for parsing

Alibaba-NLP / ACE

NER prediction #29