MaRDI4NFDI / docker-importer

Import data from external data sources into the portal
https://mardi4nfdi.github.io/docker-importer
0 stars 0 forks source link

Detangle papers for zbmath #107

Closed LizzAlice closed 9 months ago

LizzAlice commented 10 months ago

In the zbmath upload, some papers were put into the same item because they had the same title and description. This is now fixed because the de number gets appended to the descriptions, but the cases in which this happened still need to be logged and reuploaded.

1.) Query for getting number of items where this happened:

SELECT (COUNT(?item) as ?count)
WHERE{
SELECT ?item (COUNT(?hasID) as ?count)
WHERE {
?item wdt:P1451 ?hasID.
}
GROUP BY ?item
HAVING (COUNT(?hasID) > 1)}

--> result: 15382 items

2.) Query for getting the number of papers it should be:

SELECT (SUM(?count) as ?totalCount)
WHERE{
SELECT ?item (COUNT(?hasID) as ?count)
WHERE {
?item wdt:P1451 ?hasID.
}
GROUP BY ?item
HAVING (COUNT(?hasID) > 1)}

--> result: 38021

3.) Query for downloading all item ids for entities that should be deleted:

SELECT ?item (COUNT(?hasID) as ?count)
WHERE {
?item wdt:P1451 ?hasID.
}
GROUP BY ?item
HAVING (COUNT(?hasID) > 1)

4.) Query for getting all zbmath de numbers from these papers:

SELECT ?item (COUNT(?hasID) as ?count) (GROUP_CONCAT(?hasID; separator=", ") as ?ids)
WHERE {
?item wdt:P1451 ?hasID.
}
GROUP BY ?item
HAVING (COUNT(?hasID) > 1)

--> doing this in one sparql query broke something, so I will most likely have to do this via console