Open vemonet opened 4 years ago
Legend! Thanks for writing this out, we will try to integrate this in our pipeline so that the issue is resolved for next versions.
Hi, I noticed that the latest version available on Kaggle seems to have solved those encoding issues, thanks!
The version 11 file is half the size (500M) of the version 9 (1G)
I cannot find Mesh keywords in the latest version (previously defined using http://idlab.github.io/covid19#paragraphEntities )
We can only find dbpedia mappings defined using http://idlab.github.io/covid19#hasConcept
Is it normal?
Hi, @bsteenwi made some changes to the final version to reduce the size. He did indeed remove some of the relations, but I am not sure which ones exactly...
Hi, the last version of the KG does indeed mis some links. We have recreated the KG with concepts extracted from dbpedia spotlight and tried to find correlations between papers based on these concepts.
I will update the mapping scripts in this repository, so it easier to see which relations are available
Ok, we were planning to integrate your KG to the Mesh vocabulary and complementary resources (other publications KG about covid, drug, pathways db, etc). And are less interested in the dbpedia mappings (mainly due to data quality issues)
Do you know if you plan to make MeSH annotations available again soon? I could take a look into re-executing the code you wrote to generate it, but if you plan to put it back, that would be even better :)
A small note also: for MeSH URI you are using HTTPS (e.g. https://id.nlm.nih.gov/mesh/D007251) Mesh vocabulary and prefix.cc uses HTTP (http://id.nlm.nih.gov/mesh/)
Thanks!
First I would like to thank you for this KG and its documentation!
I tried to deploy your Notebooks on my infrastructure (in a Jupyterlab with root user)
I faced issues when loading the provided ntriples file from Kaggle: https://www.kaggle.com/group16/covid19-literature-knowledge-graph
http://dbpedia.org/datatype/polishZ\u0142oty does not look like a valid URI, trying to serialize this will break.
datatype rdf:langString requires a language tag
Not sure if the encoding issue is due to my environment (running Ubuntu 18.04)
I found a rather clean way to solve those issues:
I used raptor (rapper: http://librdf.org/raptor/rapper.html) to convert to turtle and solve the encoding issue
then just replace
^^rdf:langString
with^^<http:\/\/www.w3.org\/2001\/XMLSchema#string>
Or keep langString and use english tag as default
find ugent-covid-kg.ttl -type f -exec sed -i "s/\^\^rdf:langString/@en/g" {} +