DDMAL / linkedmusic-datalake

To create mapping strategies for various music databases into our data lake
https://virtuoso.staging.simssa.ca
0 stars 4 forks source link

Can we apply more efficient conversion logic? #220

Open Yueqiao12Zhang opened 1 week ago

Yueqiao12Zhang commented 4 days ago

11-8-2024

MusicBrainz & Other Potential Databases:

Advantages of Using JSON Logic:

  1. Data Structure Preservation: RDF closely aligns with JSON’s structure, perfectly conserving complex data layouts without losing fidelity—unlike CSV, which struggles with nested or hierarchical data.
  2. Simplified Reconciliation: CSV files introduced excessive, nested columns due to the JSON structure, complicating reconciliation efforts. With RDF, we avoid this, making reconciliation more straightforward.
  3. Data Integrity: Unlike CSV, where data might be truncated or result in numerous blank cells, RDF maintains full data integrity.
  4. Direct RDF Import for Reconciliation: RDF files can be directly imported into OpenRefine for reconciliation, allowing us to skip the additional CSV conversion step.
  5. Old functions preserved: We can apply the exact same functions in the old CSV2RDF, like marking language, detecting datatype, etc.

Disadvantage:

  1. Query Complexity: RDF is implemented using blank nodes, which can make querying the data more challenging.
Yueqiao12Zhang commented 4 days ago

Workflow for Converting JSON to RDF and Reconciling with OpenRefine

Steps

  1. Extract predicates from the JSON file

    • Begin by extracting all predicates from the given JSON data. These predicates will form the basis for mapping the data.
  2. Map the predicates to Wikidata properties

    • Establish mappings between the extracted predicates and relevant Wikidata properties. This will ensure consistency and alignment with existing semantic data.
  3. Convert JSON to RDF file with all properties already mapped

    • Using the mapped predicates, convert the JSON file into an RDF (Resource Description Framework) format. Ensure all properties are appropriately defined in the RDF.
  4. Upload RDF into OpenRefine

    • Import the generated RDF file into OpenRefine for further refinement and reconciliation.
  5. Reconcile using OpenRefine

    • Reconcile the RDF data using OpenRefine, linking your data to external references (e.g., Wikidata).
  6. (Cannot export RDF using OpenRefine)

    • Note that OpenRefine does not support exporting data back into RDF format directly.
  7. Use the output CSV from OpenRefine, map the reconciliation data to RDF file

    • Export the reconciled data as a CSV file from OpenRefine. Map the reconciliation data from this CSV back to the original RDF structure.
  8. Successfully reconcile RDF data

    • Finalize the reconciliation by ensuring all mappings are accurate and the RDF data is consistent with external references.