DDMAL / linkedmusic-datalake

To create mapping strategies for various music databases into our data lake
https://virtuoso.staging.simssa.ca
0 stars 4 forks source link

The Use of "BulkLoader" of Virtuoso is to be explored in order to upload a large dataset. #198

Closed candlecao closed 2 months ago

candlecao commented 2 months ago

It follows #195

candlecao commented 2 months ago

I solved it. Please see:

Be cautious:

  1. For "Bulk Loader Procedure and Sub-procedures creation SQL script", when you executed it, it might prompt some errors about incomplete execution, but it doesn't matter: image --no worries.
  2. In terms of a saying in "Tutorial 2":

    Assuming there is a folder with name "tmp" in your FS and it is under a directory specified in the [DirsAllowed] param defined in your virtuoso.ini file. I adjusted the param but it didn't take effect. So I created the "tmp" folder directly under the folder "my_virtdb". That path is recognizable by ld_dir () function.

candlecao commented 2 months ago

https://github.com/openlink/virtuoso-opensource/issues/1319 @Yueqiao12Zhang here it talked about "load the parts using multiple loaders to better utilize your CPU", for your reference

candlecao commented 4 weeks ago

Be cautious: Afaik,

  1. If the files to be bulk-loaded are in .n3 format, please don't use .ttl format, or there might be reported with errors in terminal.
  2. If there is some error in some of the RDF files to be uploaded, the corresponding file won't be uploaded successfully.

    Therefore, you had better make it secure that all the RDF files(e.g., .ttl format) are syntactically correct. To validate RDF files. There are usually 2 tools on Visual Studio Code:


    1. RDF Sketch: For visualization of RDF as well as validating the data. But it can not spot some local errors.

    2. Turtle Language Server: This is more rigorous. It will underline red where it's syntactically wrong.