Closed mank-hub closed 3 years ago
Hi, I cannot provide the full preprocessing pipeline since the original conversion from raw sentences to the data_graph json files used some Microsoft internal data processing tools. But the idea is just to run the sentences through a strong dependency parser to get the dependency structures. I believe they also filtered out some "unessential arcs" such as links to the words "to", "in", etc. Once you have the data_graph json file, you can call this function: https://github.com/VioletPeng/GraphLSTM_release/blob/8df4b0839e387439eb3be6cd697e48e7d74a63ef/theano_src/data_process.py#L410 to generate sentences_2nd & graph_arcs files that are used by this codebase. I hope this helps!
@VioletPeng Hi, I want to convert a file with sentences to the format of
data_graph
,sentences_2nd
&graph_arcs
. Can you please share the code that you have used to do that?