Open mitsuhashi opened 1 year ago
/mnt/nas05/togovar/public/virtuoso/pubmed/20230723/pubmed23n0798.ttl
13:42:16 WARN riot :: [line: 2213982, col: 20] Unicode replacement character U+FFFD in string
13:42:16 WARN riot :: [line: 2214060, col: 20] Unicode replacement character U+FFFD in string
PMID:[24973148](https://pubmed.ncbi.nlm.nih.gov/24973148/)
mitsuhashi@vs66:~$ cat /mnt/nas05/togovar/public/virtuoso/pubmed/20230723/pubmed23n0798.ttl | awk "2213982==NR && 2213982==NR { print }"
dcterms:rights "� 2009 Asian Oceanian Association for the Study of Obesity . Published by Elsevier Ltd. All rights reserved.";
mitsuhashi@vs66:~$ cat /mnt/nas05/togovar/public/virtuoso/pubmed/20230723/pubmed23n0798.ttl | awk "2214060==NR && 2214060==NR { print }"
dcterms:rights "� 2009 Asian Oceanian Association for the Study of Obesity . Published by Elsevier Ltd. All rights reserved.";
mitsuhashi@vs66:~$
XMLファイルの段階で文字化けしているので対応不可能。
rdf_portal@vs66:~/rdf_portal-rdf/work/rdf-pubmed_download/baseline$ zgrep "2009 Asian Oceanian Association for the Study of Obesity" pubmed23n0798.xml.gz
<CopyrightInformation>� 2009 Asian Oceanian Association for the Study of Obesity . Published by Elsevier Ltd. All rights reserved.</CopyrightInformation>
<CopyrightInformation>� 2009 Asian Oceanian Association for the Study of Obesity . Published by Elsevier Ltd. All rights reserved.</CopyrightInformation>
<CopyrightInformation>� 2009 Asian Oceanian Association for the Study of Obesity . Published by Elsevier Ltd. All rights reserved.</CopyrightInformation>
<CopyrightInformation>� 2009 Asian Oceanian Association for the Study of Obesity . Published by Elsevier Ltd. All rights reserved.</CopyrightInformation>
なお、Unicode replacement character U+FFFD in string 以外のWARNやERRORは出力されていない。
mitsuhashi@db01:~/yayamamo/riot$ head -10 riot_20230726.log
/mnt/nas05/togovar/public/virtuoso/pubmed/20230723/pubmed23n1420.ttl
/mnt/nas05/togovar/public/virtuoso/pubmed/20230723/pubmed23n1419.ttl
/mnt/nas05/togovar/public/virtuoso/pubmed/20230723/pubmed23n1418.ttl
09:40:20 WARN riot :: [line: 1426004, col: 65] Unicode replacement character U+FFFD in string
09:40:20 WARN riot :: [line: 1435297, col: 937] Unicode replacement character U+FFFD in string
09:40:20 WARN riot :: [line: 1435297, col: 938] Unicode replacement character U+FFFD in string
09:40:20 WARN riot :: [line: 1435297, col: 954] Unicode replacement character U+FFFD in string
09:40:20 WARN riot :: [line: 1435297, col: 955] Unicode replacement character U+FFFD in string
09:40:20 WARN riot :: [line: 1435297, col: 971] Unicode replacement character U+FFFD in string
09:40:20 WARN riot :: [line: 1435297, col: 972] Unicode replacement character U+FFFD in string
mitsuhashi@db01:~/yayamamo/riot$ grep -v mnt riot_20230726.log | grep -v "U+FFFD"
mitsuhashi@db01:~/yayamamo/riot$
Apache Jenaのriotが指摘するRDFのシンタックスエラーを無くす。 riotの実行方法は以下の通り。
TODO
RIOTの実行スクリプト