dbpedia / databus

A digital factory platform for managing files online with stable IDs, high-quality metadata, powerful API and tools for building on data: find, access, make interoperable, re-use
Apache License 2.0
42 stars 17 forks source link

Migrate DataIds from Databus 1.0 to Databus 2.1.0 #44

Open holycrab13 opened 1 year ago

holycrab13 commented 1 year ago

Repo for the migration script is here: https://github.com/dbpedia/databus-transfer

In some cases, the DataId needs to be adjusted. This needs to be configured in advance (e.g. which tags will be converted into which content variant).

There are still problems for:

/generic/revisions/2016.10.01

{
    "logLevel": "error",
    "log": [
        {
            "resource": "http://localhost:3000/janni/generic/revisions/2016.10.01",
            "msg": "Invalid content variant setup. Two or more files are not distinguishable by either dataid:formatExtension, dataid:compression or any custom content variant.",
            "payload": [
                {
                    "downloadURLs": [
                        "https://downloads.dbpedia.org/repo/dbpedia/generic/revisions/2016.10.01/revisions_ids_lang=zh_yue.ttl.bz2",
                        "https://downloads.dbpedia.org/repo/dbpedia/generic/revisions/2016.10.01/revisions_ids_lang=zh.ttl.bz2",
                        "https://downloads.dbpedia.org/repo/dbpedia/generic/revisions/2016.10.01/revisions_ids_lang=zh_min_nan.ttl.bz2"
                    ],
                    "http://dataid.dbpedia.org/ns/core#formatExtension": "ttl",
                    "http://dataid.dbpedia.org/ns/core#compression": "bzip2",
                    "http://dataid.dbpedia.org/ns/cv#lang": "zh",
                    "http://dataid.dbpedia.org/ns/cv#tag": "ids"
                },
                {
                    "downloadURLs": [
                        "https://downloads.dbpedia.org/repo/dbpedia/generic/revisions/2016.10.01/revisions_uris_lang=zh.ttl.bz2",
                        "https://downloads.dbpedia.org/repo/dbpedia/generic/revisions/2016.10.01/revisions_uris_lang=zh_min_nan.ttl.bz2",
                        "https://downloads.dbpedia.org/repo/dbpedia/generic/revisions/2016.10.01/revisions_uris_lang=zh_yue.ttl.bz2"
                    ],
                    "http://dataid.dbpedia.org/ns/core#formatExtension": "ttl",
                    "http://dataid.dbpedia.org/ns/core#compression": "bzip2",
                    "http://dataid.dbpedia.org/ns/cv#lang": "zh",
                    "http://dataid.dbpedia.org/ns/cv#tag": "uris"
                }
            ],
            "level": "error"
        }
    ]
}

/generic/categories/2016.10.01

{
    "logLevel": "error",
    "log": [
        {
            "resource": "http://localhost:3000/janni/generic/categories/2016.10.01",
            "msg": "Invalid content variant setup. Two or more files are not distinguishable by either dataid:formatExtension, dataid:compression or any custom content variant.",
            "payload": [
                {
                    "downloadURLs": [
                        "https://downloads.dbpedia.org/repo/dbpedia/generic/categories/2016.10.01/categories_labels_lang=zh_min_nan.ttl.bz2",
                        "https://downloads.dbpedia.org/repo/dbpedia/generic/categories/2016.10.01/categories_labels_lang=zh_yue.ttl.bz2",
                        "https://downloads.dbpedia.org/repo/dbpedia/generic/categories/2016.10.01/categories_labels_lang=zh.ttl.bz2"
                    ],
                    "http://dataid.dbpedia.org/ns/core#formatExtension": "ttl",
                    "http://dataid.dbpedia.org/ns/core#compression": "bzip2",
                    "http://dataid.dbpedia.org/ns/cv#lang": "zh",
                    "http://dataid.dbpedia.org/ns/cv#tag": "labels"
                },
                {
                    "downloadURLs": [
                        "https://downloads.dbpedia.org/repo/dbpedia/generic/categories/2016.10.01/categories_articles_lang=zh_min_nan.ttl.bz2",
                        "https://downloads.dbpedia.org/repo/dbpedia/generic/categories/2016.10.01/categories_articles_lang=zh.ttl.bz2",
                        "https://downloads.dbpedia.org/repo/dbpedia/generic/categories/2016.10.01/categories_articles_lang=zh_yue.ttl.bz2"
                    ],
                    "http://dataid.dbpedia.org/ns/core#formatExtension": "ttl",
                    "http://dataid.dbpedia.org/ns/core#compression": "bzip2",
                    "http://dataid.dbpedia.org/ns/cv#lang": "zh",
                    "http://dataid.dbpedia.org/ns/cv#tag": "articles"
                },
                {
                    "downloadURLs": [
                        "https://downloads.dbpedia.org/repo/dbpedia/generic/categories/2016.10.01/categories_skos_lang=zh.ttl.bz2",
                        "https://downloads.dbpedia.org/repo/dbpedia/generic/categories/2016.10.01/categories_skos_lang=zh_yue.ttl.bz2"
                    ],
                    "http://dataid.dbpedia.org/ns/core#formatExtension": "ttl",
                    "http://dataid.dbpedia.org/ns/core#compression": "bzip2",
                    "http://dataid.dbpedia.org/ns/cv#lang": "zh",
                    "http://dataid.dbpedia.org/ns/cv#tag": "skos"
                }
            ],
            "level": "error"
        }
    ]
}

/publication/strategy/2019.09.09 (Possible Databus Bug)

[
    {
        "resource": "http://localhost:3000/janni/publication/strategy/2019.09.09",
        "msg": "SHACL validation failed",
        "payload": {
            "isSuccess": false,
            "messages": [
                "All used sub-properties of dataid:contentVariant MUST be used by all dataid:Parts exactly ONCE."
            ],
            "report": {
                "@context": {
                    "conforms": {
                        "@id": "http://www.w3.org/ns/shacl#conforms",
                        "@type": "http://www.w3.org/2001/XMLSchema#boolean"
                    },
                    "result": {
                        "@id": "http://www.w3.org/ns/shacl#result",
                        "@type": "@id"
                    },
                    "value": {
                        "@id": "http://www.w3.org/ns/shacl#value"
                    },
                    "sourceShape": {
                        "@id": "http://www.w3.org/ns/shacl#sourceShape",
                        "@type": "@id"
                    },
                    "resultMessage": {
                        "@id": "http://www.w3.org/ns/shacl#resultMessage"
                    },
                    "sourceConstraintComponent": {
                        "@id": "http://www.w3.org/ns/shacl#sourceConstraintComponent",
                        "@type": "@id"
                    },
                    "resultSeverity": {
                        "@id": "http://www.w3.org/ns/shacl#resultSeverity",
                        "@type": "@id"
                    },
                    "focusNode": {
                        "@id": "http://www.w3.org/ns/shacl#focusNode",
                        "@type": "@id"
                    },
                    "schema": "http://schema.org/",
                    "dataid": "http://dataid.dbpedia.org/ns/core#",
                    "dct": "http://purl.org/dc/terms/",
                    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
                    "sh": "http://www.w3.org/ns/shacl#",
                    "dcv": "http://dataid.dbpedia.org/ns/cv#",
                    "xsd": "http://www.w3.org/2001/XMLSchema#",
                    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
                    "dash": "http://datashapes.org/dash#",
                    "dcat": "http://www.w3.org/ns/dcat#",
                    "db": "https://databus.dbpedia.org/sys/ont/"
                },
                "@graph": [
                    {
                        "@id": "_:b0",
                        "@type": "sh:ValidationReport",
                        "sh:conforms": false,
                        "result": "_:b1"
                    },
                    {
                        "@id": "_:b1",
                        "@type": "sh:ValidationResult",
                        "focusNode": "http://localhost:3000/janni/publication/strategy/2019.09.09",
                        "resultMessage": "All used sub-properties of dataid:contentVariant MUST be used by all dataid:Parts exactly ONCE.",
                        "resultSeverity": "sh:Violation",
                        "sourceConstraintComponent": "sh:SPARQLConstraintComponent",
                        "sourceShape": "file:///#cvs-are-complete",
                        "value": "2"
                    }
                ]
            }
        },
        "level": "error"
    }
]

/generic/article-templates/2016.10.01

[
    {
        "resource": "http://localhost:3000/janni/generic/article-templates/2016.10.01",
        "msg": "Invalid content variant setup. Two or more files are not distinguishable by either dataid:formatExtension, dataid:compression or any custom content variant.",
        "payload": [
            {
                "downloadURLs": [
                    "https://downloads.dbpedia.org/repo/dbpedia/generic/article-templates/2016.10.01/article-templates_lang=zh_yue.ttl.bz2",
                    "https://downloads.dbpedia.org/repo/dbpedia/generic/article-templates/2016.10.01/article-templates_nested_lang=zh_yue.ttl.bz2"
                ],
                "http://dataid.dbpedia.org/ns/core#formatExtension": "ttl",
                "http://dataid.dbpedia.org/ns/core#compression": "bzip2",
                "http://dataid.dbpedia.org/ns/cv#lang": "zh",
                "http://dataid.dbpedia.org/ns/cv#tag": "yue"
            },
            {
                "downloadURLs": [
                    "https://downloads.dbpedia.org/repo/dbpedia/generic/article-templates/2016.10.01/article-templates_nested_lang=zh.ttl.bz2",
                    "https://downloads.dbpedia.org/repo/dbpedia/generic/article-templates/2016.10.01/article-templates_nested_lang=zh_min_nan.ttl.bz2"
                ],
                "http://dataid.dbpedia.org/ns/core#formatExtension": "ttl",
                "http://dataid.dbpedia.org/ns/core#compression": "bzip2",
                "http://dataid.dbpedia.org/ns/cv#lang": "zh",
                "http://dataid.dbpedia.org/ns/cv#tag": "nested"
            },
            {
                "downloadURLs": [
                    "https://downloads.dbpedia.org/repo/dbpedia/generic/article-templates/2016.10.01/article-templates_lang=bat_smg.ttl.bz2",
                    "https://downloads.dbpedia.org/repo/dbpedia/generic/article-templates/2016.10.01/article-templates_nested_lang=bat_smg.ttl.bz2"
                ],
                "http://dataid.dbpedia.org/ns/core#formatExtension": "ttl",
                "http://dataid.dbpedia.org/ns/core#compression": "bzip2",
                "http://dataid.dbpedia.org/ns/cv#lang": "bat",
                "http://dataid.dbpedia.org/ns/cv#tag": "smg"
            }
        ],
        "level": "error"
    }
]

/generic/page/2016.10.01

[
    {
        "resource": "http://localhost:3000/janni/generic/page/2016.10.01",
        "msg": "Invalid content variant setup. Two or more files are not distinguishable by either dataid:formatExtension, dataid:compression or any custom content variant.",
        "payload": [
            {
                "downloadURLs": [
                    "https://downloads.dbpedia.org/repo/dbpedia/generic/page/2016.10.01/page_ids_lang=zh.ttl.bz2",
                    "https://downloads.dbpedia.org/repo/dbpedia/generic/page/2016.10.01/page_ids_lang=zh_min_nan.ttl.bz2",
                    "https://downloads.dbpedia.org/repo/dbpedia/generic/page/2016.10.01/page_ids_lang=zh_yue.ttl.bz2"
                ],
                "http://dataid.dbpedia.org/ns/core#formatExtension": "ttl",
                "http://dataid.dbpedia.org/ns/core#compression": "bzip2",
                "http://dataid.dbpedia.org/ns/cv#lang": "zh",
                "http://dataid.dbpedia.org/ns/cv#tag": "ids"
            },
            {
                "downloadURLs": [
                    "https://downloads.dbpedia.org/repo/dbpedia/generic/page/2016.10.01/page_length_lang=zh_yue.ttl.bz2",
                    "https://downloads.dbpedia.org/repo/dbpedia/generic/page/2016.10.01/page_length_lang=zh.ttl.bz2"
                ],
                "http://dataid.dbpedia.org/ns/core#formatExtension": "ttl",
                "http://dataid.dbpedia.org/ns/core#compression": "bzip2",
                "http://dataid.dbpedia.org/ns/cv#lang": "zh",
                "http://dataid.dbpedia.org/ns/cv#tag": "length"
            }
        ],
        "level": "error"
    }
]

Several 404s on DataIds:

/wikidata/revision/2020.03.01,"Response code 404 (Not Found)"
/wikidata/page/2021.06.01,"Response code 404 (Not Found)"
/wikidata/properties/2021.06.01,"Response code 404 (Not Found)"
/wikidata/redirects/2021.06.01,"Response code 404 (Not Found)"
/wikidata/mappingbased-objects/2020.03.01,"Response code 404 (Not Found)"
/ontology/dbo-snapshots/2019.02.21T08.00.00Z,"Response code 404 (Not Found)"
holycrab13 commented 1 year ago

Most issues evolve around tags of the language zh