XuezheMax / NeuroNLP

Deep neural models for core NLP tasks
MIT License
13 stars 4 forks source link

How should we run NER with sequence_labeling.py #1

Open yuchenlin opened 7 years ago

yuchenlin commented 7 years ago

Hi Max,

Thank you very much for open sourcing the code of you brilliant paper. I was wondering how should we use sequence_labeling.py to perform NER task. I just simply replaced the data file paths in the run_sequencelabeling.sh, but it told me that

$ bash run_sequence_labeling.sh
2017-06-18 16:56:14,379 - Sequence Labeling - INFO - Creating Alphabets
2017-06-18 16:56:14,379 - Create Alphabets - INFO - Creating Alphabets: data/alphabets/
2017-06-18 16:56:14,379 - Create Alphabets - INFO - Processing data: ../data/split/train.conll.iob
Traceback (most recent call last):
  File "sequence_labeling.py", line 455, in <module>
    main()
  File "sequence_labeling.py", line 262, in main
    40000)
  File "NeuroNLP/neuronlp/io/data_utils.py", line 66, in create_alphabets
    pos = tokens[4]
IndexError: list index out of range

We would be very grateful if you can show an example of NER with conll-format dataset. Thank you very much!

BTW, looking forward to your pytorch version NeuroNLP2!

Thank you very much!

XuezheMax commented 7 years ago

Hi Bill,

The data format used in the code is in the following: 1 EU NNP I-NP S-ORG 2 rejects VBZ I-VP O 3 German JJ I-NP S-MISC 4 call NN I-NP O 5 to TO I-VP O 6 boycott VB I-VP O 7 British JJ I-NP S-MISC 8 lamb NN I-NP O 9 . . O O

So there is an indexing column at the beginning of each raw. Thanks.

On Sun, Jun 18, 2017 at 4:58 PM, Bill Yuchen Lin notifications@github.com wrote:

Hi Max,

Thank you very much for open sourcing the code of you brilliant paper. I was wondering how should we use sequence_labeling.py to perform NER task. I just simply replace the data file paths in the run_sequencelabeling.sh but it tells me that

$ bash run_sequence_labeling.sh 2017-06-18 16:56:14,379 - Sequence Labeling - INFO - Creating Alphabets 2017-06-18 16:56:14,379 - Create Alphabets - INFO - Creating Alphabets: data/alphabets/ 2017-06-18 16:56:14,379 - Create Alphabets - INFO - Processing data: ../data/split/train.conll.iob Traceback (most recent call last): File "sequence_labeling.py", line 455, in main() File "sequence_labeling.py", line 262, in main 40000) File "NeuroNLP/neuronlp/io/data_utils.py", line 66, in create_alphabets pos = tokens[4] IndexError: list index out of range

We would be very grateful if you can show an example of NER with conll-format dataset. Thank you very much!

BTW, looking forward to your pytorch version NeuroNLP2!

Thank you very much!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/XuezheMax/NeuroNLP/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/ADUtlpzgKGMELwUaUHwFHeeHFzQPFw9Uks5sFOavgaJpZM4N9cZi .

--

Best regards, Ma,Xuezhe Language Technologies Institute, School of Computer Science, Carnegie Mellon University Tel: +1 206-512-5977

yuchenlin commented 7 years ago

Thank you very much!

I have another question. Since in the paper, we don't consider the pos tags or dependency roles of each token, so it is okay to assign each token with two fake placeholders as its pos tag and dependency role respectively, am I right?

I mean, will the following data work the same as yours?

1 EU A B S-ORG
2 rejects A B O
3 German A B S-MISC
4 call A B O
5 to A B O
6 boycott A B O
7 British A B S-MISC
8 lamb A B O
9 . A B O

(I'm doing a specific NER task without pos tag or dependency role.)

Thanks again!

XuezheMax commented 7 years ago

Yes, if you only do NER, then the pos tags and dependency roles have no impact.

On Thu, Jun 22, 2017 at 8:56 PM, Bill Yuchen Lin notifications@github.com wrote:

Thank you very much!

I have another question. Since in the paper, we don't consider the pos tags or dependency roles of each token, so it is okay to assign each token with two fake placeholders as its pos tag and dependency role respectively, am I right?

I mean, will the following data work the same as yours?

1 EU A B S-ORG 2 rejects A B O 3 German A B S-MISC 4 call A B O 5 to A B O 6 boycott A B O 7 British A B S-MISC 8 lamb A B O 9 . A B O

(I'm doing a specific NER task without pos tag or dependency role.)

Thanks again!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/XuezheMax/NeuroNLP/issues/1#issuecomment-310371968, or mute the thread https://github.com/notifications/unsubscribe-auth/ADUtllKyu5Hi7pFLVhdA-V0dR1x1V4deks5sGmSAgaJpZM4N9cZi .

--

Best regards, Ma,Xuezhe Language Technologies Institute, School of Computer Science, Carnegie Mellon University Tel: +1 206-512-5977

yuchenlin commented 7 years ago

Thank you very much!