InternetHealthReport / internet-yellow-pages

A knowledge graph for Internet resources
GNU General Public License v3.0
39 stars 16 forks source link

Make `batch_create_nodes` method idempotent #90

Closed mohamedawnallah closed 9 months ago

mohamedawnallah commented 9 months ago

Description

To ensure the batch_create_nodes function's idempotence, we should switch from using CREATE to MERGE. While CREATE isn't inherently idempotent, MERGE is, as it combines CREATE and MATCH, creating a new row only when it doesn't already exist. You can refer to the Neo4j MERGE documentation for more details on how MERGE works.

Motivation and Context

Closes #89

How Has This Been Tested?

  1. Repeated Pipeline Execution:

    • The data pipeline atlas_probes.py was executed multiple times without encountering any errors.
  2. Pipeline with New Properties:

    • Executing the data pipeline atlas_probes.py multiple times, including new properties, reflected the changes accurately in the node properties within the Neo4j web interface.

Screenshots (if appropriate):

https://github.com/InternetHealthReport/internet-yellow-pages/assets/69568555/4dfb5767-182c-4c55-8a6a-8140819e7efc

Types of changes

Checklist:

romain-fontugne commented 9 months ago

I'll close this one, as @m-appel came up with a new set of functions that should fix this.