Closed prunacomar closed 1 week ago
Hi there, congrats on great project! I get into it asking about rank info on Wikidata.
Thank you! Nobody has ever bought me a coffee for it :,-)
A small tip: it might be useful to add into your publication page https://danker.s3.amazonaws.com/index.html a link to the current file versions. Perhaps something like https://danker.s3.amazonaws.com/current.allwiki.links.rank.bz2
Thanks @prunacomar for the suggestion - I will look into it. At the moment I'm thinking about some additional metadata in the schema.org descriptions.
Probably the conclusion of this:
isLatest
flag.So I guess we will need to learn how to sort by version (or modified dates) that are in the markup of the website
Something like this would work:
wget "https://danker.s3.amazonaws.com/$(curl -s https://danker.s3.amazonaws.com/index.html | grep 'version\":' | sed -s 's/ \"version\": \"//' | sed 's/\",//' | sort -u | tail -n1).allwiki.links.rank.bz2"
But instead of ugly-dissecting the HTML with the best that Linux hast to offer you can also SPARQL it (will work with rdflib==7.1.0
):
from rdflib import Graph
from rdflib.parser import URLInputSource
from rdflib.plugins.parsers.jsonld import JsonLDParser
uri = "https://danker.s3.amazonaws.com/index.html"
g = Graph()
src = URLInputSource(uri)
p = JsonLDParser()
p.parse(src, g, extract_all_scripts=True)
q = """
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT (strbefore(str(max(xsd:dateTime(CONCAT(?version, "T00:00:00Z")))), "T") as ?versiond)
WHERE {
?x <http://schema.org/version> ?version.
}"""
r = g.query(q)
for row in r:
print(f"{row.versiond}")
Hi there, congrats on great project! I get into it asking about rank info on Wikidata.
A small tip: it might be useful to add into your publication page https://danker.s3.amazonaws.com/index.html a link to the current file versions. Perhaps something like https://danker.s3.amazonaws.com/current.allwiki.links.rank.bz2
Regards,