Open kcmcleod opened 5 years ago
Similar effect with hasPart. This time the issue is the markup which creates nodes with no content.
This html:
<div class="annotation" property="hasPart" typeof="CreativeWork">Belongs to the
<a href="/uniprot/?query=family:%22hedgehog+family%22&sort=score">hedgehog family</a>.
<span class="attribution ECO305">
<span class="attributionHeader tooltipped" title="Manual assertion inferred by
curator">Curated
</span>
</span>
</div>
Produces the following raw triples:
genid-2f27fdee3aaf4285a4db8253476df489-n61 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/CreativeWork .
http://purl.uniprot.org/uniprot/Q62226 http://schema.org/hasPart genid-2f27fdee3aaf4285a4db8253476df489-n61 .
I convert to:
http://bioschemas.org/crawl/v1/30/www.uniprot.org/uniprot/Q62226/1168557303 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://schema.org/CreativeWork .
http://purl.uniprot.org/uniprot/Q62226 http://schema.org/hasPart http://bioschemas.org/crawl/v1/30/www.uniprot.org/uniprot/Q62226/1168557303 .
Thus we no longer have blank nodes BUT we do have nodes with basically no information. On this single page there seems to be more than 10 instances of this. Ultimately produces a very cluttered and unuseful page.
Same result:
HTML source:
<div property="hasPart" class="annotation">
<ul class="noNumbering subcellLocations">
<li class="Nucleus">
<h6>Nucleus</h6>
<ul>
<li>
<a href="/locations/SL-0191">Nucleus </a><a class="icon icon-generic tooltipped" data-tippy="The nucleus is the most obvious organelle in any eukaryotic cell. It is a membrane-bound organelle surrounded by double membranes which contains most of the cell's genetic material. It communicates with the surrounding cytosol via numerous nuclear pores." data-icon="i"></a>
<span class="attribution ECO269">
<span class="attributionHeader ">1 Publication<span class="showHideEvidence caret_grey displayThisInline"></span></span>
<span style="display:none" class="evidenceContainer">
<p class="attributionExplain"><span class="context-help tooltipped-click html tipId-1">Manual assertion based on experiment in<sup>i</sup></span></p>
<ul>
<li>
<div class="Q8K330#ref1 referenceAttribution">
<div class="reference_header">Ref.1</div>
<div class="reference_content">
<div property="citation" resource="http://purl.uniprot.org/citations/14531860" typeof="ScholarlyArticle"><strong property="name">"Differential activities, subcellular distribution and tissue expression patterns of three members of Slingshot family phosphatases that dephosphorylate cofilin."</strong><br/><a href="/uniprot/?query=author:%22Ohta+Y.%22&sort=score" rel="nofollow">Ohta Y.</a>, <a href="/uniprot/?query=author:%22Kousaka+K.%22&sort=score" rel="nofollow">Kousaka K.</a>, <a href="/uniprot/?query=author:%22Nagata-Ohashi+K.%22&sort=score" rel="nofollow">Nagata-Ohashi K.</a>, <a href="/uniprot/?query=author:%22Ohashi+K.%22&sort=score" rel="nofollow">Ohashi K.</a>, <a href="/uniprot/?query=author:%22Muramoto+A.%22&sort=score" rel="nofollow">Muramoto A.</a>, <a href="/uniprot/?query=author:%22Shima+Y.%22&sort=score" rel="nofollow">Shima Y.</a>, <a href="/uniprot/?query=author:%22Niwa+R.%22&sort=score" rel="nofollow">Niwa R.</a>, <a href="/uniprot/?query=author:%22Uemura+T.%22&sort=score" rel="nofollow">Uemura T.</a>, <a href="/uniprot/?query=author:%22Mizuno+K.%22&sort=score" rel="nofollow">Mizuno K.</a><br/><a href="http://dx.doi.org/10.1046/j.1365-2443.2003.00678.x">Genes Cells 8:811-824(2003)</a> [<a property="sameAs" href="https://www.ncbi.nlm.nih.gov/pubmed/14531860">PubMed</a>] [<a property="sameAs" href="https://europepmc.org/abstract/MED/14531860">Europe PMC</a>] [<a href="/citations/14531860">Abstract</a>]</div>
<div class="citedFor"><span class="details"><strong>Cited for:</strong></span> NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 1), FUNCTION, SUBCELLULAR LOCATION, TISSUE SPECIFICITY, DEVELOPMENTAL STAGE, MUTAGENESIS OF CYS-410.</div>
</div>
</div>
</li>
</ul>
</span>
</span>
</li>
</ul>
</li>
<li class="Cytoskeleton">
<h6>Cytoskeleton</h6>
<ul>
<li>
<a href="/locations/SL-0090">cytoskeleton </a><a class="icon icon-generic tooltipped" data-tippy="The cytoskeleton is a dynamic three-dimensional structure that fills the cytoplasm of cells. The cytoskeleton is responsible for cell movement, cytokinesis, and the organization of the organelles or organelle-like structures within the cell. The major components of the cytoskeleton are the microfilaments (of actin), microtubules (of tubulin), the intermediate filament systems and a fourth group, the MinD-ParA group, that appears to be unique to bacteria." data-icon="i"></a>
<span class="attribution ECO269">
<span class="attributionHeader ">1 Publication<span class="showHideEvidence caret_grey displayThisInline"></span></span>
<span style="display:none" class="evidenceContainer">
<p class="attributionExplain"><span class="context-help tooltipped-click html tipId-1">Manual assertion based on experiment in<sup>i</sup></span></p>
<ul>
<li>
<div class="Q8K330#ref1 referenceAttribution">
<div class="reference_header">Ref.1</div>
<div class="reference_content">
<div property="citation" resource="http://purl.uniprot.org/citations/14531860" typeof="ScholarlyArticle"><strong property="name">"Differential activities, subcellular distribution and tissue expression patterns of three members of Slingshot family phosphatases that dephosphorylate cofilin."</strong><br/><a href="/uniprot/?query=author:%22Ohta+Y.%22&sort=score" rel="nofollow">Ohta Y.</a>, <a href="/uniprot/?query=author:%22Kousaka+K.%22&sort=score" rel="nofollow">Kousaka K.</a>, <a href="/uniprot/?query=author:%22Nagata-Ohashi+K.%22&sort=score" rel="nofollow">Nagata-Ohashi K.</a>, <a href="/uniprot/?query=author:%22Ohashi+K.%22&sort=score" rel="nofollow">Ohashi K.</a>, <a href="/uniprot/?query=author:%22Muramoto+A.%22&sort=score" rel="nofollow">Muramoto A.</a>, <a href="/uniprot/?query=author:%22Shima+Y.%22&sort=score" rel="nofollow">Shima Y.</a>, <a href="/uniprot/?query=author:%22Niwa+R.%22&sort=score" rel="nofollow">Niwa R.</a>, <a href="/uniprot/?query=author:%22Uemura+T.%22&sort=score" rel="nofollow">Uemura T.</a>, <a href="/uniprot/?query=author:%22Mizuno+K.%22&sort=score" rel="nofollow">Mizuno K.</a><br/><a href="http://dx.doi.org/10.1046/j.1365-2443.2003.00678.x">Genes Cells 8:811-824(2003)</a> [<a property="sameAs" href="https://www.ncbi.nlm.nih.gov/pubmed/14531860">PubMed</a>] [<a property="sameAs" href="https://europepmc.org/abstract/MED/14531860">Europe PMC</a>] [<a href="/citations/14531860">Abstract</a>]</div>
<div class="citedFor"><span class="details"><strong>Cited for:</strong></span> NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 1), FUNCTION, SUBCELLULAR LOCATION, TISSUE SPECIFICITY, DEVELOPMENTAL STAGE, MUTAGENESIS OF CYS-410.</div>
</div>
</div>
</li>
</ul>
</span>
</span>
</li>
</ul>
</li>
</ul>
</div>
Output from Google:
To view this on Google: https://search.google.com/structured-data/testing-tool#url=https%3A%2F%2Fwww.uniprot.org%2Funiprot%2FQ8K330
Triple produced by any23:
http://purl.uniprot.org/uniprot/Q8K330 http://schema.org/hasPart
Cytoskeleton
cytoskeleton 1 PublicationManual assertion based on experiment ini
Ref.1
"Differential activities, subcellular distribution and tissue expression patterns of three members of Slingshot family phosphatases that dephosphorylate cofilin."
,
,
,
,
,
,
,
,
[
] [
] [
]
Cited for: NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 1), FUNCTION, SUBCELLULAR LOCATION, TISSUE SPECIFICITY, DEVELOPMENTAL STAGE, MUTAGENESIS OF CYS-410.
Nucleus
Nucleus 1 PublicationManual assertion based on experiment ini
Ref.1
"Differential activities, subcellular distribution and tissue expression patterns of three members of Slingshot family phosphatases that dephosphorylate cofilin."
,
,
,
,
,
,
,
,
[
] [
] [
]
Cited for: NUCLEOTIDE SEQUENCE [MRNA] (ISOFORM 1), FUNCTION, SUBCELLULAR LOCATION, TISSUE SPECIFICITY, DEVELOPMENTAL STAGE, MUTAGENESIS OF CYS-410.
Notice the order in the HTML is Nucleus then Cytoskeleton, which is the order Google has too. HOWEVER, the order is reversed by any23. Furthermore, notice how much of the text found by Google is not detected by Any23.
ALSO notice that much of the text inside the HTML has completely gone from both Google and any23. E.g., The HTML says "The cytoskeleton is a dynamic three-dimensional structure that fills the cytoplasm of cells", but this is missing from both Google and any23.
When properties are nested, the inner properties are removed to form triples leaving the outer property looking rather messy. Eg, from https://www.uniprot.org/uniprot/Q62226 :
The triple representing the text property (in the 2nd line) ends up as:
Google SDT Tool
Leaves in the text that is removed by Any23; however, it is still not easy to read and has weird bits in it. Better than Any23 though.
Extruct
Behaves in the same way as Google.