Open balhoff opened 1 year ago
I'm not sure I understand the issue with nodes, we may want to chat about that. Looks like this is on the right track though.
I think the fastest/cleanest way would be to add a condition here for rdf: https://github.com/RobokopU24/ORION/blob/81b2988c2f3a1174d461ec908f4e17efa76d81c5/Common/build_manager.py#L108
It could read from the jsonl nodes and edges files that were produced in the previous merging step as a completed graph (graph_output_dir/NODES_FILENAME and EDGES_FILENAME) and write them out in rdf. It might be nice to just make a file conversion helper like kgx_file_converter.py has for jsonl to csv.
Then we could specify rdf as output format for a graph like here: https://github.com/RobokopU24/ORION/blob/81b2988c2f3a1174d461ec908f4e17efa76d81c5/graph_specs/default-graph-spec.yml#L15
This approach has the downside that if rdf is the only output you care about, it's going to merge the sources and write them to kgx jsonl files first for no great reason. We could also incorporate the rdf output further upstream to avoid that but I haven't had time to think about how we might want to do that.
I took a stab at implementing an RDF file writer (just for edges, not nodes at the moment—I don't think we want to have duplicate node metadata in different RDF datasets). @EvanDietzMorris I have not actually run this; could you let me know if I'm on the right track, and what else needs to be done to output some Turtle files in the ORION build?