Open eKathleenCarter opened 5 months ago
[drop_duplicates](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop_duplicates.html)
is more efficient for lines 197 and 220
I would suggest replacing lines 198 through 216 with the following:
df.rename(columns={"dmdb_ids": "drugmechdb_path_id", "qualified_predicates": QUALIFIED_PREDICATE, "object_direction_qualifiers": OBJECT_DIRECTION_QUALIFIER, "object_aspect_qualifiers": OBJECT_ASPECT_QUALIFIER}, inplace=True) df[KNOWLEDGE_LEVEL] = KNOWLEDGE_ASSERTION df[AGENT_TYPE] = MANUAL_AGENT
df['edge_props'] = df.apply(lambda x: x[QUALIFIED_PREDICATE, OBJECT_DIRECTION_QUALIFIER, OBJECT_ASPECT_QUALIFIER, KNOWLEDGE_LEVEL, AGENT_TYPE].dropna().to_dict(), axis=1)
for i, row in df.iterrows():
output_edge = kgxedge( subject_id=row["source_ids"], object_id=row["target_ids"], predicate=row["predicates"], edgeprops=row['edge_props'], primary_knowledge_source=self.provenance_id ) self.output_file_writer.write_kgx_edge(output_edge) Because iterrows is EXTREMELY slow and inefficient
_Originally posted by @eKathleenCarter in https://github.com/RobokopU24/ORION/pull/221#discussion_r1588280747_
is more efficient for lines 197 and 220
I would suggest replacing lines 198 through 216 with the following:
df.rename(columns={"dmdb_ids": "drugmechdb_path_id", "qualified_predicates": QUALIFIED_PREDICATE, "object_direction_qualifiers": OBJECT_DIRECTION_QUALIFIER, "object_aspect_qualifiers": OBJECT_ASPECT_QUALIFIER}, inplace=True) df[KNOWLEDGE_LEVEL] = KNOWLEDGE_ASSERTION df[AGENT_TYPE] = MANUAL_AGENT
df['edge_props'] = df.apply(lambda x: x[QUALIFIED_PREDICATE, OBJECT_DIRECTION_QUALIFIER, OBJECT_ASPECT_QUALIFIER, KNOWLEDGE_LEVEL, AGENT_TYPE].dropna().to_dict(), axis=1)
for i, row in df.iterrows():
_Originally posted by @eKathleenCarter in https://github.com/RobokopU24/ORION/pull/221#discussion_r1588280747_