MaRDI4NFDI / portal-compose

docker-composer repo for mardi
https://portal.mardi4nfdi.de
GNU General Public License v3.0
3 stars 1 forks source link

Resync wdqs #507

Closed physikerwelt closed 3 months ago

physikerwelt commented 3 months ago

Describe the issue As discussed this monday

physikerwelt commented 3 months ago

I started a second (non-destructive) updater wdqs-updater-2 with a slightly updated run file

#!/usr/bin/env bash
# This file is provided by the wikibase/wdqs docker image.

cd /wdqs || exit

# TODO env vars for entity namespaces, scheme and other settings
/wait-for-it.sh "$WIKIBASE_HOST:80" -t 300 -- \
/wait-for-it.sh "$WDQS_HOST:$WDQS_PORT" -t 300 -- \
./runUpdate.sh -N -h http://"$WDQS_HOST":"$WDQS_PORT" -- -t 10 --idrange 1-10000000 --wikibaseUrl "$WIKIBASE_SCHEME"://"$WIKIBASE_HOST" --conceptUri "$WIKIBASE_SCHEME"://"$WIKIBASE_HOST" --entityNamespaces "$WDQS_ENTITY_NAMESPACES"
bash-4.4$ ./runUpdate.sh 

Screenshot 2024-03-12 at 23 28 15

manual https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual#Configurable_properties

physikerwelt commented 3 months ago

With the current speed it should be finished in less than one week

physikerwelt commented 3 months ago

the inital resync was done, however sitelinks have not been updated.

physikerwelt commented 3 months ago

It seems that running a script once again solves the problem (at least for all items manually investigated) So restarting with

./runUpdate.sh -N -h http://"$WDQS_HOST":"$WDQS_PORT" -- --idrange 1-10000000 --wikibaseUrl "$WIKIBASE_SCHEME"://"$WIKIBASE_HOST" --conceptUri "$WIKIBASE_SCHEME"://"$WIKIBASE_HOST" --entityNamespaces "$WDQS_ENTITY_NAMESPACES"

physikerwelt commented 3 months ago

For software the idrange was relatively small.

PREFIX wdt: <https://portal.mardi4nfdi.de/prop/direct/>
PREFIX wd: <https://portal.mardi4nfdi.de/entity/>
SELECT ?qid ?item WHERE {
    BIND (REPLACE(STR(?item), "^.*/Q([^/]*)$", "$1") as ?qid)
    ?item wdt:P1460 wd:Q5976450 .
    ?item wikibase:sitelinks ?sitelinks .
    FILTER (?sitelinks < 1 ).
}
LIMIT 100000
OFFSET 0

started with 450 results. After running

./runUpdate.sh -N -h http://"$WDQS_HOST":"$WDQS_PORT" -- --idrange 5975000-5984444 --wikibaseUrl "$WIKIBASE_SCHEME"://"$WIKIBASE_HOST" --conceptUri "$WIKIBASE_SCHEME"://"$WIKIBASE_HOST" --entityNamespaces "$WDQS_ENTITY_NAMESPACES"

No results were obtained. Consequently

root@91e78119efac:/var/www/html# ./maintenance/run ./extensions/MathSearch/maintenance/ProfilePages.php create software
Read from offset 0.
Retrieved 0 results.
Pushed jobs to last segment 0.
root@91e78119efac:/var/www/html# 

did not produce any results.

eloiferrer commented 3 months ago

I see wdqs-updater-2 in portainer, but I do not find the old wdqs-updater. wdqs-updater-2 seems idle. Is there any container that does the synchronization as of now?

physikerwelt commented 3 months ago

I think wdqs-updater-2 can be shut down, that wdqs-updater is not running is a separate issue. Maybe wdqs-updater-2 was a unfortunate name that prevent wdqs-updater from being recreated. I'll take care