Closed ayrtondenner closed 6 years ago
Hi, There are two ways you can do this. Since the NER model uses only the words and the NER labels in the data, one way is to convert your format to match the original format by filling the POS and Chunking columns with any symbols you like. Another way is to write a new Reader to handle your format.
I see. Assigning "None" to pos, chunk and ner variables in create_alphabets
isn't enough? This way won't be any real assignment to such values. Or I guess I will insert "_" chars in my database, so I can create two more columns to match the current code.
I am not sure if assigning None to them will raise errors or not. I read POS and chunk information for the consideration to use them in the future. I guess inserting '_' is a good idea :)
On Tue, Apr 24, 2018 at 3:06 PM, Ayrton Denner notifications@github.com wrote:
I see. Assigning "None" to pos, chunk and ner variables in createalphabets isn't enough? This way won't be any real assignment to such values. Or I guess I will insert "" chars in my database, so I can create two more columns to match the current code.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/XuezheMax/NeuroNLP2/issues/11#issuecomment-384045417, or mute the thread https://github.com/notifications/unsubscribe-auth/ADUtlkrpdDgS7GD_GZVmDBICJO3ScDR_ks5tr3exgaJpZM4TiN9R .
Best regards, Ma,Xuezhe Language Technologies Institute, School of Computer Science, Carnegie Mellon University Tel: +1 206-512-5977
Ok, so I will try that. Thanks!
Hello, I saw in XuezheMax/NeuroNLP2#9 that you used a data formed of 4 columns for NER. I am trying to run it in a corpus formed of 2 columns, like in this pic:
So, my text base is formed of a column with an word and another column with a tag only. Is there any way to parameterize the script to support such kind of data, or I will have to adapt the code specific for my use? For instance, I will have to change in
conll03_data
to read tokens[0] instead of tokens[1] as an word, and deal with pos, chunk and ner alphabet. Anything else I should know?Thanks.