Closed jaygray0919 closed 7 years ago
Please merge with https://github.com/ampproject/amphtml/issues/3432
@jaygray0919 Could you provide the example of what the template would look like in terms of your PropertyTable/Properties example above?
Here's the amp-mustache
default case (template-1):
<template
type="amp-mustache"
id="amp-template-1"
>
<ul>
<li>CID: {{CID}}</li>
<li>Molecular formula: {{MolecularFormula}}</li>
</ul>
</template>
The amp-list looks like this:
<p>
<amp-list
width=auto
height=80
layout=fixed-height
template="amp-template-1"
src="https://dl.dropboxusercontent.com/u/3094317/pubchem_data.json"
>
</amp-list>
</p>
For the above to work, we have to edit the source as follows:
{
"items": [
{
"CID": 2244,
"MolecularFormula": "C9H8O4"
}
]
}
Next, let's look at the PubChem target: https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/property/MolecularFormula/JSON
We have to wrap the source in a root items:[ ]
like so:
https://dl.dropboxusercontent.com/u/3094317/pubchem_data_avec_header.json
which looks like this:
{
"items":
[
{
"PropertyTable":
{
"Properties":
[
{
"CID": 2244,
"MolecularFormula": "C9H8O4"
}
]
}
}
]
}
Here's template-2:
<template
type="amp-mustache"
id="amp-template-2"
>
<ul>
<li>CID: {{#PropertyTable}}{{#Properties}}{{CID}}{{/Properties}}{{/PropertyTable}}</li>
<li>Molecular formula: {{#PropertyTable}}{{#Properties}}{{MolecularFormula}}{{/Properties}}{{/PropertyTable}}</li>
</ul>
</template>
<p>
<amp-list
width=auto
height=80
layout=fixed-height
src="https://dl.dropboxusercontent.com/u/3094317/pubchem_data_avec_header.json"
template="amp-template-2"
></amp-list>
</p>
Our proposal, processing the HTTPS/CORS/PubChem/JSON, would look like this (template-3):
<template
type="amp-mustache"
id="amp-template-3"
>
<ul>
<li>CID: {{#PropertyTable}}{{#Properties}}{{CID}}{{/Properties}}{{/PropertyTable}}</li>
</ul>
</template>
<p>
<amp-list
width=auto
height=80
layout=fixed-height
src="https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/property/MolecularFormula/JSON"
template="amp-template-3"
></amp-list>
</p>
Does this make sense? Do I need to clarify or correct anything?
As an aside, we face a similar but more challenging problem processing JSON-LD.
However, in that case, we set the @context
as follows:
<script type="application/ld+json" id="2">
{
"@context":
{
"@vocab": "http://schema.org/",
"id": "@id",
"graph": "@graph",
"type": "@type"
},
"graph":
[
In the case above, we use "graph":
in place of "items":
. The substitutions enable us to skip the need to process or escape @
. But here too we'd like to define our root term and navigate to the target key:value
pair.
In the JSON-LD case, as opposed to the PubChem JSON example, we almost always have edit control. Nevertheless, we'd appreciate having the ability to navigate from "graph":
rather than only "items":
@jaygray0919 I think I understand, but I'm not sure how amp-list
would "understand" which properties to iterate? I see one option to supply the "root" expression to find the array to iterate on. E.g. something like this:
<amp-list ... items="PropertyTable.Properties">
The items
attribute will default to "items"
to keep backward compatibility with the current codepath.
Then the JSON can simply be:
{
"PropertyTable": {
"Properties": [...]
Would this solve your problem?
Lemme speak with some folks and get back to you quickly. Also will reach out to @danbri as he's leading the schema.org initiative which uses, among other structures, JSON-LD to represent structured data. TY for your analysis and attention.
Allow me to "review the bidding" so all information is one place.
There are three simple tests in this file: https://dl.dropboxusercontent.com/u/3094317/index_subset.html
Template 1 edits PubChem data to fit an items-based JSON structure. We simplified by removing PubChem classification labels "PropertyTable" and "Properties."
{
"items": [
{
"CID": 2244,
"MolecularFormula": "C9H8O4"
}
]
}
Template 2 follows Template 1 but uses original PubChem labels "PropertyTable" and "Properties"; where the structure is wrapped in an AMP-required "items-based JSON structure".
{
"items": [
{
"PropertyTable": {
"Properties": [
{
"CID": 2244,
"MolecularFormula": "C9H8O4"
}
]
}
}
]
}
With Template 3, we learned that the target file can be a .txt
file if it conforms to an items-based JSON structure.
Our current amp-list
processor looks like this:
<amp-list
width=auto
height=80
layout=fixed-height
src="https://dl.dropboxusercontent.com/u/3094317/pubchem_data_avec_header.json"
template="amp-template"
></amp-list>
In order to process a JSON structure that does not include an items-based JSON structure, like this:
{
"PropertyTable": {
"Properties": [
{
"CID": 2244,
"MolecularFormula": "C9H8O4"
}
]
}
}
you propose that our amp-list
processor would look like this:
<amp-list
items="PropertyTable.Properties"
width=auto
height=80
layout=fixed-height
src="HTPPS CORS file.txt"
template="amp-template"
></amp-list>
That looks like a good solution for the current design of amp-list
. Thank you for that.
May we update our request based on new information?
Template 3 suggests that you may not be processing the HTTPS header for "Content-Type".
We would like to have a general purpose AMP processor, such as amp-list-ld+json
, that specifically processes a JSON-LD file.
The header for such a file includes Content-Type: application/ld+json
, which is the companion to an in-line HTML specification in the form <script type="application/ld+json"></script>
.
Such a processor would enable us to process an HTTPS/CORS/JSON-LD file in this format:
{
"@context": {
"@vocab": "http://schema.org/"
},
"items": [
{
"CID": 2244,
"MolecularFormula": "C9H8O4"
}
]
}
For example, here is an eBay document on the Google CDN: https://cdn.ampproject.org/c/m.ebay.com/sch/amp/Camera-Drones/179697/bn_89951/i.html
I formatted their Schema.org JSON-LD for easier reading here: https://gist.github.com/jaygray0919/d3375b52768bbe7e3e39f4386c93706a
If we had an amp-list-ld+json
processor, our pseudo amp-mustache
template would get values for the keys "name" and "url" as follows:
{{#"mainEntity"}}{{#"@type": "ItemList"}}{{#"itemListElement"}}{{#"@type": "ItemList"}}{{#"itemListElement"}}{{"name"}}{{"url"}}{{/"itemListElement"}}{{/"@type": "ItemList"}}{{/"itemListElement"}}{{/"@type": "ItemList"}}{{/"mainEntity"}}
The two leading JSON-LD
and schema.org
authorities on this topic are Dan Brickley (@danbri) and Gregg Kellogg (@gkellogg ). Whatever they suggest as part of this thread 'trumps' my "review-of-the-bidding" (a technical comment about general expertise; not a political comment about presidential contenders).
Stage A (your updated proposal) is very helpful. Stage B - the JSON-LD and schema.org request - would enable publishers to do two important things:
0 Integrate JSON-LD in-line an AMP/HTML document (as is done today).
1 Host the same JSON-LD structure as a "file.jsonld" on their HTTPS/CORS server.
2 Use amp-list-ld+json
and amp-mustache
to hydrate the in-line HTML that otherwise has to be re-keyed to present a user with the identical information that is included in <script type="application/ld+json"></script>
.
(phew, 2 is a long sentence)
In common language, we can't expose the in-line JSON-LD to humans. Instead, we have to re-format the content using microformat syntax, or perhaps just conventional HTML elements. If we had a capability like above, a publisher would store their JSON-LD on a server and then processes it in an AMP file using amp-list
and amp-mustache
. While we'll have to write the mustache templates, that is an easier task than re-keying data already defined in the JSON-LD file.
Thank you again for the short term solution. Please consider the subsequent proposal near term. Dan and Gregg may use their trump cards to take control of this discussion.
Seems like a reasonable direction. I'm happy to help fine-tune such a mechanism.
/cc @cramforce
Thanks, @jaygray0919 ! I'd like to clarify a few things to make sure I understand the need and a set of problems correctly.
First, it sounds like we have a bug that doesn't confirm the response content-type. That's a bad omission and we'll definitely address it (#3667). We'll wait a bit on fixing it before we understand if we make it serve the needs here better.
Second, your goal is to render JSON-LD format as a valid AMP HTML. Correct? If so, there are several questions that I'd like to ask.
<amp-list>
element semantically? AMP Lists are meant to represent the actual lists: we expect in the future to possibly add protocols such as page-by-page fetching for them and other list-specific functionality. If it's not a list - we could consider providing a new <amp-render>
element that would be void of list semantics but would still allow fetch/render via template. As an instance, your PubChem example seems to NOT be a list, but I could be mistaken.<amp-list>
element indeed: it's very reasonable for us to add some navigation property to find the actual array in the data structure, which we discussed above. From your description I'm not yet sure if this is sufficient. In particular, many examples contain more than one list. And, again, if this is not a list - we could allow an arbitrary navigation from the root via {{#name}}
syntax via <amp-render>
.<script>
in the current document vs CORS resource?One thing I'd like to note: if we support <amp-render>
it'd still have to follow our strict sizing rules in AMP. These rules help us avoid content FOUC and shifting during reading. If the goal is to render a significant portions of the content based on this data, neither amp-list
nor amp-render
might be a great fit, since doing this could be a significant drag on client-side latency and performance. The original intent behind <amp-list>
was to enable functionality such as "Related Links" sections, which are contextual and fresh but not critical to the reader - in other words they are not the main content of a document.
Some quick history with apologies if this is "old hat" to readers. The marriage between XML and HTML has always been problematic. RSS was among the first processors to display XML to a human reader. RDF/XML (and subsequently OWL) is the content structure many publishers use to apply controlled vocabularies to specify the meaning of and relationships among their content. But RDF/XML does not easily fit HTML; most folks who want to display RDF content resort to using XSLT; but that is for stand-alone pages. Manu Sporny, Gregg Kellogg ( @gkellogg ) et al were instrumental in developing RDFa as a means to expose the semantic content of an RDF structure in HTML. Several processors evolved to help generate RDFa from RDF/XML that, subsequently, could be integrated with conventional HTML. But most of us chose to re-key our RDF to conform to RDFa when we needed to make the content human-readable. Google subsequently developed Microdata - a simplified version of RDFa - for HTML.
Bottom line: a publisher has to maintain two versions of semantic data. One version is for "machine-to-machine" communication (RDF); one version is for "machine-to-human_and_machine" communication (RDFa). JSON-LD is a significant improvement on the structure of RDF/XML (a lot less typing and a lighter-weight message). Schema.org ( led by @danbri ) is a general purpose controlled vocabulary that defines many information Types and their properties for defining content. When implemented as a JSON-LD structure, content is comparable to RDF/XML for machine-to-machine communication. And, as you know, it's the semantic specification of content delivered in AMP.
But the problem remains: a publisher has to maintain two versions of their content - JSON-LD and HTML (either RDFa or Microdata). As a publisher (we're not a news organization that creates @NewsArticle
content) we have converted our entire content database to JSON-LD structures. These structures are graphs of Types and properties using Schema.org and other controlled vocabularies. When discovered by a harvesting machine, our graphs can be combined with complementary graphs to express deeper knowledge on a subject area. But we still cannot easily expose the content of those graphs to human readers via HTML.
We were very excited when AMP make a commitment to JSON-LD + Schema.org; and further excited when we saw a systematic method to read and process remote data. While we have jerry-rigged solutions in the past that used combinations of JavaScript, XMLHttpRequest and JSONP, they all have limitations (either technical or marketing).
Here was our idea in simple terms:
A. Continue to expand our JSON-LD library using general-purpose (schema.org) and specialized controlled vocabularies.
B. Publish AMP pages that integrate our JSON-LD in <body>
(thank you @gregable !).
C. Use conventional HTML elements to display content.
D. Get the semantic content from an HTTPS/CORS source (our content and content published by other folks).
E. Format the semantic content using amp-mustache
.
Since we cannot process the JSON-LD when it is AMP-in-line, A thru E enable us to "GET" and process the data for HTML presentation.
Sure, we have to write data processing code (mustache) but we had to do that anyway in some form. Further, we can teach our content curators to use mustache; and they are comfortable doing so.
So, with A thru E we can have our cake and eat it too. One "data file" reused 'in-line' for web crawlers and 'on-server' for integration with HTML.
Now to your questions.
amp-list
works perfectly. In effect, we create lists (in fact lists-of-lists, i.e. Lisp). Our suggestion is that you ask us "what kind of list do you want me to get for you?" Our answer would be amp-list-ld+json
. Google AMP then would have a systematic program to process JSON-LD. Our advice is to ask Gregg Kellogg (@gkellogg) how to do this as he's done this many times before.amp-list
and process it using amp-mustache
. We might use someone else's JSON-LD structure in-line AMP. If we did so, we always would include provenance mark-up.I can see from your follow up note that we may be trying to reshape a square hole to accommodate a round peg. We don't want to push too hard on a specific solution; we defer to your greater expertise here. And you chaps have to maintain this stuff, so we understand why you are going to carefully think thru the implications of our request. And we don't know about or have used amp-render
.
BUT ... IOHO you are going in the right direction and we want to avail ourselves of your solutions ASAP. Hence, our desire to make changes to amp-list
.
Let us know what we can do next to help.
@jaygray0919 Ok, I think I got what you'd like to do here. Wasn't quite sure whether you wanted to transform JSON for the purposes of amp-list
, but it now sounds like you do. A follow up question here: how often would this be needed inside of the template? E.g. if we expanded the <amp-list items="...">
protocol to allow something like this: items="@type:PropChem"
- would that be sufficient? Or will there always be nested cases that need to be navigated inside the template?
@dvoytenko Really appreciate your attention to this issue. Would it make sense to have a quick concall to make sure we are on the same page? Lemme know.
WRT your question. JSON-LD documents can be deeply nested (highly recursive). I've looked at several of our schema.org JSON-LD documents; several are 4 levels ({@type-1 {@type-2 {@type-3 {@type-4}}}}
). But we have others that are deeper (using rdf:Container
). In contrast, we also have rdf:List
, for which your proposal is a perfect fit.
My thinking here is:
items
(e.g. the PubChem files).@context
as the zero-level type).After we get some experience with #1
and #2
we can tackle more complex JSON-LD documents. An idea there is to require the developer to specify the path used by the publisher - something like the nested examples I sketched above.
I mentioned earlier and replay here for completeness: the two blokes who really get these issues (and from whom we've learned) are Dan Brickley (@danbri) and Gregg Kellogg (@gkellogg). I wouldn't want to do too much here without their advice. I can specify something that meets our special case, but we all of want/need a general purpose solution.
Could you attach a few sample data files to this issue?
@cramforce and @dvoytenko
Here are 7 non-AMP data access scenarios that may help guide upgrades to amp-list
and amp-mustache
.
1 JSONP accessing a JSON file http://ontomatica.com/test_amp/data_access_protocols/AMP_using_JSONP/index_JSON-P_-_Dropbox-URL_.html
The JSON file is served by HTTPS from Dropbox: https://dl.dropboxusercontent.com/u/3094317/bisonapi_original_subset_.json
Scenario 1 does not use jQuery or AJAX.
2 JSONP accessing a JSON file served by HTTP from a USGS server: http://ontomatica.com/test_amp/data_access_protocols/AMP_using_JSONP/index_JSON-P_-_USGS-URL_.html
Scenario 2 does not use jQuery or AJAX.
3 JSONP using jQuery and accessing a JSON file served by HTTPS from a US NIH PubChem RESTful server http://ontomatica.com/test_amp/data_access_protocols/AMP_using_JSONP/index_PubChem_REST_JSONP_.html
The PubChem interface to REST data supports a query string that a developer can compose to select specific data.
While this examples uses jQuery, HTTPS access to PubChem REST data does not require JavaScript - it's a plain RESTful interface that could be processed by amp-list
and amp-mustache
.
4 Accessing in-line JSON-LD using jQuery (example 1) http://ontomatica.com/test_amp/data_access_protocols/AMP_using_jquery_JSON-LD/index_GS1_jquery_JSON-LD_inline.html
This example is an in-line JSON-LD structure that is processed using JavaScript and jQuery.
Here is the data-block: http://ontomatica.com/test_amp/data_access_protocols/AMP_using_jquery_JSON-LD/JSON-LD_data-block-1.json
Amp-list
and amp-mustache
do not now process the in-line JSON-LD, but that is desirable.
5 Accessing in-line JSON-LD using jQuery (example 2) http://ontomatica.com/test_amp/data_access_protocols/AMP_using_jquery_JSON-LD/index_Person_jquery_JSON-LD_inline.html
Here is the data-block: http://ontomatica.com/test_amp/data_access_protocols/AMP_using_jquery_JSON-LD/JSON-LD_data-block-2.json
In this case, the JavaScript parser looks more like an amp-mustache
template.
6 Accessing XML data using jQuery and GET (example 1) http://ontomatica.com/test_amp/data_access_protocols/AMP_using_XML/index_quotes_GET_XML.html
This technique accesses remote XML using a convention GET method. It would be good if AMP supported access to XML using a GET-like technique. Note: my daily usage may have been exceeded. Try this another day. The source code otherwise is accurate.
7 Accessing XML data using jQuery and GET (example 2) http://ontomatica.com/test_amp/data_access_protocols/AMP_using_XML/index_synonyms_GET_XML.html
Ask me questions and I'll clarify.
/jay gray
@jaygray0919 @cramforce
I've reviewed all examples and scripts. I can now break down what you'd like to accomplish into the following set of hypothetical tasks/solutions:
<script id="inline1" type="application/(ld+)json">{}</script>
.amp-list
author can choose a different array root. Currently it's always items
field. The new format is root="path.to.array"
root="path.array1[@type=t1].value"
.{{#path.array[@type=t1]}}
.Let me know if these solutions do not cover all of the examples and cases you've communicated. If you agree, please order these items in importance.
So, the (2) is obvious and I can implement this quickly. But as for others, see below.
IIUR, (3) and (4) stem from XML-inspired JSON structures with prevalent arrays and XPath-like navigation. E.g. a JSON-LD format would typically include something like "array": [ {"@type": "Chemical", ...}, ...]
, as opposed to (naively, of course) "array": {"Chemical": {...}, ...}
. I can envision some root selector format for this. But difficulties are:
@type
value.We discussed in the comments above an alternative option to transform LD document. But, besides the performance penalty, we need to consider that (a) transformation might not be that obvious, and (b) we still need to answer all of the same questions above.
Another approach I can propose here. We can create a new template system mustache-ld
, that will use the same Mustache implementation that we do now, but will apply necessary transformations to the input to support the needed feature-set.
The data inlining (1) is really intended here for the purpose of implementing client-side rendering. This is something that conflicts with the goals of AMP and we have to be very careful about. It appears that most of the examples shown above suggest this use.
In case when the data is inlined, the rendering will be delayed by amp-list
and amp-mustache
downloads, instantiation and finally rendering - all of these operations are subject to content resizing that AMP is very strict about. All this time the user might be just staring at the blank screen waiting for all of these steps to complete. If data is not inlined, but used the same way, the download latency for data is added to the overall latency.
My opinion on this is that the main document content has to be rendered server-side for the maximum performance and cacheability and client-side rendering should only be used for secondary content, such as "related documents" sections, etc. I can certainly imagine some use-cases where this rule of thumb can be broken, but I can't glean into those use-cases from the examples above yet.
I can move on task (2) and possibly (3) at this time. Others - we either have feasibility or usecase questions. So at this time, I'm not ready to proceed with (1) and (4). How would this affect your vision of JSON-LD rendering?
@dvoytenko and @cramforce
#2
is good and helpful. #2
will enable us to parse CORS/JSON files where the Publisher selects the term for the array root. An example is JSON files published by US NIH PubChem.
We see problems with #3
.
A brief aside on nomenclature. "Simple JSON-LD" means that @Type
is used only once in an array. "Complex JSON-LD" means @Type
is used 2+ times. Examples: @id
, @Thing/name
, etc.
Because the AMP design - in general - dictates server-side data processing, it's going to be difficult to implement #3
for "complex JSON-LD." Will #3
return values for all instances of @Type
or only the first @Type
? How would we isolate the 4th instance of @Type
? You make a similar argument in "Array navigation."
Bottom line: we understand why #1
, #3
and #4
are not possible. However, we will use #2
for "mustache" use cases.
We now need to ask your advice. Our goal is to implement our entire site in AMP. Specifically, for every page, we need <amp-sidebar>
for menus and <amp-accordion>
for information hiding. Most pages will be AMP-valid. However, some pages will use JavaScript as above (e.g. JSONP or in-line JSON-LD processing). These pages will not be AMP-valid. But if we load
<script async src='https://cdn.ampproject.org/v0.js'></script>
,
<script async custom-element='amp-accordion' src='https://cdn.ampproject.org/v0/amp-accordion-0.1.js'></script>
and
<script async custom-element='amp-sidebar' src='https://cdn.ampproject.org/v0/amp-sidebar-0.1.js'></script>
,
we would expect those pages to render and display properly. Is that true?
All of our pages use <script type="application/ld+json" id=""></script>
so we expect harvesters will properly parse the page even if it is not AMP-valid.
Further, we expect that <amp-analytics type="" id=""><script type="application/json"></script></amp-analytics>
will not be parsed for non-valid-AMP-pages.
If this strategy is realistic, we can properly segment AMP-valid and non-valid-AMP pages in our @WebSite
configuration.
Please advise on this strategy.
Hey,
The AMP community has been working nonstop to make AMP better, but somehow we've still managed to grow an enormous backlog of open issues. This has made it difficult for the community to prioritize what we should work on next.
A new process is on the way and to give it a chance for success we will be closing issues that have not been updated in awhile.
If this issue still requires further attention, simply reopen it. Please try to reproduce it with the latest version to ensure it gets proper attention!
We really appreciate the contribution! Thank you for bearing with us as we drag ourselves out of the issue abyss. :)
@dvoytenko @cramforce
Thank you for the enhanced solution here: https://www.ampproject.org/docs/reference/components/amp-list as implemented by John Pettitt: https://groups.google.com/forum/#!topic/amphtml-discuss/kc2E_Zxrq1w
the solution work perfectly and opens accessibility to a new world of data to complement other data on our AMP pages.
/jay
Thanks, @jpettitt !
We have used
amp-list
andamp-mustache
successfully in several situations.However, the requirement that the root term is
"items":[{}]
is limiting and basically requires full edit control of the target JSON file. Example:Here's our proposal: Require the developer to specify the root term when reading a HTTPS/CORS/JSON document. A developer already has to specify tree navigation with a series of open and close (# and /) statements. Why not simply require the developer to specify the root term?
For example, if root is
"items": [{}]
, then the opening statement is{{#items}}
. Then, if there are properties on a target page, one would target them as follows:{{#items}}{{#Properties}}{{CID}}{{/Properties}}{{/items}}
Alternatively, if the root term is PropertyTable, then the statement would be as follows:
{{#PropertyTable}}{{#Properties}}{{CID}}{{/Properties}}{{/PropertyTable}}
Of course the JSON file with the root terms PropertyTable would have to be a valid array, as follows:
Is there a systematic problem with this approach?