Convert data to Peng's graph format

Hi, I cannot provide the full preprocessing pipeline since the original conversion from raw sentences to the data_graph json files used some Microsoft internal data processing tools. But the idea is just to run the sentences through a strong dependency parser to get the dependency structures. I believe they also filtered out some "unessential arcs" such as links to the words "to", "in", etc. Once you have the data_graph json file, you can call this function: https://github.com/VioletPeng/GraphLSTM_release/blob/8df4b0839e387439eb3be6cd697e48e7d74a63ef/theano_src/data_process.py#L410 to generate sentences_2nd & graph_arcs files that are used by this codebase. I hope this helps!

VioletPeng / GraphLSTM_release

Convert data to Peng's graph format #5