VioletPeng / GraphLSTM_release

Implementation of TACL 2017 paper: Cross-Sentence N-ary Relation Extraction with Graph LSTMs. Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova and Wen-tau Yih.
61 stars 19 forks source link

Convert data to Peng's graph format #5

Closed mank-hub closed 3 years ago

mank-hub commented 3 years ago

@VioletPeng Hi, I want to convert a file with sentences to the format of data_graph , sentences_2nd & graph_arcs. Can you please share the code that you have used to do that?

VioletPeng commented 3 years ago

Hi, I cannot provide the full preprocessing pipeline since the original conversion from raw sentences to the data_graph json files used some Microsoft internal data processing tools. But the idea is just to run the sentences through a strong dependency parser to get the dependency structures. I believe they also filtered out some "unessential arcs" such as links to the words "to", "in", etc. Once you have the data_graph json file, you can call this function: https://github.com/VioletPeng/GraphLSTM_release/blob/8df4b0839e387439eb3be6cd697e48e7d74a63ef/theano_src/data_process.py#L410 to generate sentences_2nd & graph_arcs files that are used by this codebase. I hope this helps!