Closed AlasdairGray closed 3 years ago
Only the file for the homepage contains markup, which is about the DataCatalog, Dataset, Citation, Organization, and contact person.
roqet -i sparql11 -e 'SELECT * WHERE { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o}' -D 1.nq
roqet: Running query 'SELECT * WHERE { ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o}'
roqet: Query has a variable bindings result
row: [s=uri<https://disprot.org/#DataCatalog>, o=uri<https://schema.org/DataCatalog>]
row: [s=uri<https://bioschemas.org/profiles/DataCatalog/0.3-RELEASE-2019_07_01>, o=uri<https://schema.org/CreativeWork>]
row: [s=uri<https://doi.org/10.1093/nar/gkz975>, o=uri<https://schema.org/ScholarlyArticle>]
row: [s=uri<https://disprot.org/#2020-12>, o=uri<https://schema.org/Dataset>]
row: [s=uri<https://bioschemas.org/profiles/Dataset/0.3-RELEASE-2019_06_14>, o=uri<https://schema.org/CreativeWork>]
row: [s=uri<https://biocomputingup.it/#Organization>, o=uri<https://schema.org/Organization>]
row: [s=uri<https://creativecommons.org/licensEs/by/4.0/>, o=uri<https://schema.org/CreativeWork>]
row: [s=uri<https://bioschemas.org/crawl/v1/disprot/disprot/20210813/1/disprot.org/748790195>, o=uri<https://schema.org/Person>]
row: [s=uri<https://bioschemas.org/profiles/Organization/0.2-DRAFT-2019_07_19>, o=uri<https://schema.org/CreativeWork>]
roqet: Query returned 9 results
The MobiDB and PED homepages are victims of the BMUSE bug https://github.com/HW-SWeL/BMUSE/issues/79. Their scraped files contain no content.
Need to add Dataset and DataCatalog queries:
At the moment, pages without
schema:Protein
types are ignored. Would be good to check what other types are in the full scrape and to grab some of that data. In particular, there is data aboutDataset
andDataCatalog
.To Do: