Using the NeuroNLP2 in a different data format

ayrtondenner commented 6 years ago

Hello, I saw in XuezheMax/NeuroNLP2#9 that you used a data formed of 4 columns for NER. I am trying to run it in a corpus formed of 2 columns, like in this pic:

So, my text base is formed of a column with an word and another column with a tag only. Is there any way to parameterize the script to support such kind of data, or I will have to adapt the code specific for my use? For instance, I will have to change in conll03_data to read tokens[0] instead of tokens[1] as an word, and deal with pos, chunk and ner alphabet. Anything else I should know?

Thanks.

XuezheMax commented 6 years ago

Hi, There are two ways you can do this. Since the NER model uses only the words and the NER labels in the data, one way is to convert your format to match the original format by filling the POS and Chunking columns with any symbols you like. Another way is to write a new Reader to handle your format.

ayrtondenner commented 6 years ago

I see. Assigning "None" to pos, chunk and ner variables in create_alphabets isn't enough? This way won't be any real assignment to such values. Or I guess I will insert "_" chars in my database, so I can create two more columns to match the current code.

XuezheMax commented 6 years ago

I am not sure if assigning None to them will raise errors or not. I read POS and chunk information for the consideration to use them in the future. I guess inserting '_' is a good idea :)

On Tue, Apr 24, 2018 at 3:06 PM, Ayrton Denner notifications@github.com wrote:

I see. Assigning "None" to pos, chunk and ner variables in createalphabets isn't enough? This way won't be any real assignment to such values. Or I guess I will insert "" chars in my database, so I can create two more columns to match the current code.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/XuezheMax/NeuroNLP2/issues/11#issuecomment-384045417, or mute the thread https://github.com/notifications/unsubscribe-auth/ADUtlkrpdDgS7GD_GZVmDBICJO3ScDR_ks5tr3exgaJpZM4TiN9R .

--

Best regards, Ma，Xuezhe Language Technologies Institute, School of Computer Science, Carnegie Mellon University Tel: +1 206-512-5977

ayrtondenner commented 6 years ago

Ok, so I will try that. Thanks!

XuezheMax / NeuroNLP2

Using the NeuroNLP2 in a different data format #11

--