Open sebastianbenz opened 5 years ago
I can't add comments to the design doc. I would like to comment that Bing likes this and I wanted to add our section with the Google cache that we will do the same rewrite to bing-amp.com that we do today. We already rewrite from ampproject, so, we would have to add that re-write.
Another question is around build dist. Will it change? We currently override the cache and 3p addresses so we can host ourselves. Is there any plan to change that?
I can't add comments to the design doc. I would like to comment that Bing likes this and I wanted to add our section with the Google cache that we will do the same rewrite to bing-amp.com that we do today. We already rewrite from ampproject, so, we would have to add that re-write.
That is great to hear! I've changed permissions so anyone can comment on the design doc.
Couple of notes:
In regards to 1: The plan is for caches to rewrite origin URLs to serve their own version of the Runtime. So yes and no. If it's Google's cache then the JavaScript will come from Google infra. Otherwise if the Runtime is self hosted then it may not be what Google has served in the past.
In regards to 2: Wouldn't this necessitate publishers to republish ALL their AMP documents with each version change?
On 1 -- What does the transition change about this? For 2 -- Agree about requiring resource integrity.
In regards to 2: Wouldn't this necessitate publishers to republish ALL their AMP documents with each version change?
It wouldn't if the url used in the document contained the version number of AMP used on the document.
But then if they want to move forward to another version, all documents still need to be updated. It's not sufficient for them to just update whatever was at https://example.com/amp/v0.js.
I guess I don't understand the "and optimize in Cache". The cache can always rewrite it.
If we're worried about a version being too old, there may be other options. One that was suggested was for each Runtime release to have an expiry function. If it's passed a certain time it fetches from cdn.ampproject.org or some other "failure."
TBH, I'm conflicted. Publishers should be able to use whatever they want on origin. If we're concerned that these are served by caches with a newer version then maybe there's another solution. Caches could choose not to accept origins that do this, inspect the runtime that is being served and decide not to accept them, or require some kind of assurance from the domain that it isn't X old.
The resource integrity suggestion is one way to ensure the AMP runtime on the page doesn't meaningfully change from the cache's (and thus the document wouldn't be guaranteed to execute on the cache correctly).
I also like your idea of Caches choosing to not support versions beyond a threshold (and using the content of the self-hosted files to ensure a document is running a compliant version).
@dvoytenko – What do you think about @honeybadgerdontcare's suggestion here?
The caches should definitely reject the "too old" versions. But somehow the self-hosted scripts should indicate what RTV they belong to. Resource integrity can do that, or even a simple attribute amp-rtv="12345678"
can do that too. The only exception of this is when we expect the self-hosted scripts to be most recent version, but that partially defeats the reason why someone would want to self-host.
Let's look at a basic premise: a publisher wants to self-host to have their own upgrade cycle. I.e. to avoid any regressions on their site when AMP goes from V to V+1. If they don't specify V anywhere, our cache would simply "optimize" the pages to use V+1, which defeats the publisher's reasoning. So we need them to tell us the V. I don't think updating pages to go from version to version is a big deal.
One note aside: subresource integrity would not work with amp-geo
extension since it's a dynamic extension.
@dvoytenko This i2i could also improve performance on self hosted documents.
@kristoferbaxter yeah, I was wondering if somehow we could improve the case for the self-hosting for the purpose of only avoiding CDN. Maybe the scripts can state that (as much as possible) they are relying on the most recent releases of AMP. We could even alerts web masters when we notice that the actual served RTV is too far away from AMP's PROD?
I agree we should do something here, it would be a shame for documents to not get served by caches because their version of components get too outdated.
@sebastianbenz and @honeybadgerdontcare do you have some ideas about how we could address this? Perhaps as a second step?
But I think we're so far finding two clear use cases, right?
I like the idea of encoding the AMP runtime version into the markup. This way we could use Search Console to automatically warn publishers if:
This would make hosting slightly more complicated for publishers though, because they not only need to update the JS, but also all their HTML.
+1 to @dvoytenko's summary.
@Gregable suggested adding some runtime code that checks (current_time - RTV) <= max_skew
and reloads from AMP CDN if false. Basically, version locking for v0.js.
Pubs could recompile and remove, though most probably won't go through the trouble. One concern is breaking everything if their CSP is misconfigured i.e. doesn't allow cdn.ampproject.org
.
In a previous design review there was a discussion around what RTVs are "green" and what RTVS may be "bad". Do we still have plans on publishing thatj?
I agree we should have some mechanism to say this "RTV" is bad and not supported including saying we are no longer going to rewrite certain RTVs to green. That said, the Validator does no know what RTV a host is serving, nor does Search Console. We'd have to add that ability if we want to restrict what is served/rewritten.
@sebastianbenz I have an idea of how we could handle the difficulties amp-geo
presents when self hosting a runtime.
If a cache is not able to support dynamically generated amp-geo, then fallback to fetching the country when amp-geo
loads. Given that /v0/amp-geo-0.1.js
is an asynchronously loaded script that appends a body class late in the page load process, this fallback isn't terrible and will only be utilized by caches that can't support the faster option of dynamically altering amp-geo
. The clear advantage to this feature is that there would be no need to disable version locking or to create a "stable" version of amp-geo-0.1.js
that is many-version compatible.
To support this feature, there needs to be an API from which the country code could be fetched. The AMP project could offer this feature at the CDN level (with cache modification). I suggest this be made part of the release process (not part of the build process), perhaps something like https://cdn.ampproject.org/geo-api-0.1.json
returns:
{
"country": "de"
}
Then, when amp-geo
detects unpatched value {{AMP_ISO_COUNTRY_HOTPATCH}}
, it queries the API instead of defaulting to "unknown".
@mdmower I like this idea!
@Gregable would speak anything against providing such an API endpoint?
@jridgewell and @zhouyx might have more state than me.
@mdmower I'm not sure I understand what returning JSON from https://cdn.ampproject.org/geo-api-0.1.json
improves over returning JS from https://cdn.ampproject.org/v0/amp-geo-0.1.js
. It does not appear to change anything with regards to @sebastianbenz 's comments about independence, control, or performance.
@Gregable From the start of my involvement in trying to find a solution for self-hosting, I've been advocating for publishers who would like to self-host the AMP runtime but do not have fancy CDNs that can perform modifications to components based on country (see Guidelines: Adding a new cache to the AMP ecosystem). The solution to this problem so far has been to self-host the entire runtime except for amp-geo
, which could be fetched from cdn.ampproject.org. This raises plenty of concern: "can we still use version locking?" and "do we need to offer an 'API stable' version of amp-geo
outside of the normal release process?" See sections Version Locking and Dynamic components: amp-geo in the design doc.
The suggestion I am making lets publishers self-host the entire runtime, including amp-geo
, and if they do not have the capability to modify amp-geo-0.1.js
at the CDN level, the component will fall-back to querying an API to determine the country. The advantage to this method is that it is far easier to make a small JSON file like { "country": "de" }
API stable across many runtime versions than it is to hope that amp-geo-0.1.js
from any single runtime version will work with many other runtime versions. The level of effort is also very low for the AMP project to help these publishers. The capability to modify a cached resource by country is already implemented for amp-geo-0.1.js
and it simply needs to be extended to a new JSON file.
I argue that this hits both of the independence and control categories. The reason is that it broadens the self-hosting audience to include small publishers who can then modify the runtime locally (control) or host with archiving in mind [in case certain rtvs disappear from cdn.ampproject.org] (independence).
My suggested changes to amp-geo
are in draft PR #26407.
@Gregable suggested adding some runtime code that checks
(current_time - RTV) <= max_skew
and reloads from AMP CDN if false. Basically, version locking for v0.js.Pubs could recompile and remove, though most probably won't go through the trouble. One concern is breaking everything if their CSP is misconfigured i.e. doesn't allow
cdn.ampproject.org
.
If I understand this suggestion correctly, I think it should only be implemented for AMP pages served from the Google AMP cache. Otherwise, you're telling publishers they are not allowed to take snapshots in history to record the way in which a page was rendered by a specific version of the runtime.
@mdmower Hopefully @jridgewell or someone else on this thread can address version locking and the possible need to upgrade the client-side implementation over time.
From my read of the cache code, I don't see why patching the country into the .json document would be any different than patching it into the .js document. The cache implementation would probably be straightforward.
Otherwise, I'm not sure I have enough context to compare the options.
Just very generally: We'd like to avoid providing an easily usable general purpose geo location API that folks would easily starting to rely on outside of amp-geo. This is for numerous reasons.
So, I'd strongly prefer a solution that is directly tried to amp-geo in some fashion.
Just very generally: We'd like to avoid providing an easily usable general purpose geo location API that folks would easily starting to rely on outside of amp-geo. This is for numerous reasons.
So, I'd strongly prefer a solution that is directly tried to amp-geo in some fashion.
To play devil's advocate, doesn't the API already exist for outsiders?
(await fetch('https://cdn.ampproject.org/v0/amp-geo-0.1.js').then(r => r.text()).then(t => /"([a-z]{2}) {26}"/.exec(t)))[1]
@mattwomple I think this is a good example of the line. If we break that regex and somebody files a bug here, then I would not in any way feel bad about just closing it.
I cannot make a decision on providing a amp-geo JSON API. That's way above my authority.
someone else on this thread can address version locking
There's a very real risk that the locally compiled v0.js
will not be compatible with the amp-geo-0.1.js
file served by the AMP JS CDN. That's one of the reasons we version lock to begin with.
But what are the chances the locally compiled version will match one of the official versions (say, /rtv/123
)? Is the locally compiled version going to have the RTV "123"? Unless you're compiling from the exact same SHA, then no. So the only way you can get Runtime to request /rtv/123/amp-geo-0.1.js
is if you're compiling the same SHA.
And if you have any uncommitted local modifications, then you'll get a different local RTV number. And there's no guarantee the property mangling will match (we've observed this before).
If you request the generic /v0/amp-geo-0.1.js
, then we're getting really risky.
The only way local serving works without any issues is if you are capable of serving every extension. Providing a a amp–geo JSON API would solve that, but again, I can't make that decision.
Separate topic from amp-geo
:
I have been revisiting my original suggestion to use a <meta>
tag to update config.urls.cdn
when an AMP document is loaded in singleDoc
mode. See PR #25028 and section "Full self-hosting" in the design doc. This may not be the best path forward. Consider this...
Several AMP components import urls
, e.g. import {urls} from '../../../src/config';
. Since imports are written at compile time, these components have the urls
object hardcoded into them. As #25028 current exists, these components do not recognize changes to urls.cdn
when an ampdoc
instance is created.
I can see a few options to move forward, but I'm not sure which is preferred:
<script id="amp-config">
window.AMP_CONFIG = window.AMP_CONFIG || {};
window.AMP_CONFIG.cdnUrl = 'https://example.com/myruntime';
</script>
This is by far the easiest solution, as config.js
already has support for initializing certain urls
members via environment (AMP_CONFIG
) variables. The question is whether the AMP page could still be considered valid. Is it possible to ensure the script only writes to AMP_CONFIG
and does nothing else?
config.js
gets new methods to propagate urls.cdn
changes when they are detected. It's important to note that this propagation could occur later than some components need, or would require significant rewrites to change the order of operations in all affected components to "wait for urls
promise, then proceed".<script id="amp-config" type="application/json">
{
"urls": {
"cdn": "https://example.com/myruntime"
}
}
</script>
config.js
gets a new method to check for document
and then for #amp-config
. If detected, then when urls
is defined, JSON properties in #amp-config
are merged into the final object. The caveat is that every file that includes import {urls} from 'path/to/src/config
would get a bit bulkier and would be running the exact same check (check for element, decode JSON, write to urls
object).
@Gregable @mdmower @zhouyx @jridgewell and I just met and discussed the the proposal. Here is what we came up with:
<meta name="runtime-host" content="https://example.com/optional/path/prefix">
amp-geo: we will provide an unpatched version of amp-geo.js that supports calling a publisher hosted geo endpoint to retrieve the current country. The AMP Cache will not provide such an endpoint. Instead, publishers need to configure a geo endpoint via meta tag. amp-geo.js then calls the endpoint on load to retrieve the location:
<meta name="amp-geo-api" content="https://example.com/location.json">
I'll update the design doc with more details in the next days. Feedback is welcome :-)
To add to the amp-geo approach
amp-geo.js
will be updated to make another roundtrip request when the country code is not presented. There will still only be one version of this component.<amp-consent>
waiting for the geolocation, element with amp-geo-pending
further blocked, and higher chances of UI shifting caused by appending the CSS with further delay. amp-geo.js
if possible.amp-geo.js
may require changes to the json response format as well. We will notice publishers in advance if such changes are required.
- Publishers need to serve the runtime using the same RTV pattern used by AMP Caches (unversioned URLs will not be supported by the validator).
Is this correct? I agree with with the rtv pattern requirement, but can't publishers still use unversioned URLs? I think they just need to follow the same logic that cdn.ampproject.org uses to fetch versioned components, e.g. if the AMP page includes script https://example.com/v0.js
, and the runtime determines its own version is 123
, then version locking results in components downloading from https://example.com/rtv/123/v0/amp-component-0.1.js
. Briefly, the requirements are:
https://example.com/
https://example.com/rtv/{rtv}/
https://example.com/rtv/metadata
I think @mdmower is correct. I don't think it's necessary for validator to require versioned URLs in the <script>
src
s. It's only required that the publisher respond to https://example.com/rtv/${RTV}/*
requests so that runtime may dynamically requests scripts (either due to version locking rejecting a mismatched version, or a missing script, etc).
It's only required that the publisher respond to https://example.com/rtv/${RTV}/*
This means simple static sites cannot self-host AMP, which is one of the use cases I'd like to support. Is there no way around this?
Also: doesn't the runtime check the version again after downloading or does it assume that it has the correct version if downloading from an RTV endpoint?
This means simple static sites cannot self-host AMP
I'm not sure how you're reaching this conclusion. A publisher can opt to use RTV specific URLs, in which case they only need to host the runtime at example.com/rtv/123/
, or they can use unversioned URLs, in which case they need to host identical runtimes at example.com/
and example.com/rtv/123/
.
Got it - thanks for the clarification. In this case, I'd actually prefer to enforce RTV specific URLs via the validator to keep things simple.
Got it - thanks for the clarification. In this case, I'd actually prefer to enforce RTV specific URLs via the validator to keep things simple.
Sorry to drag this out, but I'm confused. Why? If a publisher would like to follow the same model as the AMP project where AMP pages reference a rolling AMP release, shouldn't they be able to do so?
Maybe I'm missing something. What are the advantages for a publisher in having a rolling AMP release if they need to provide an RTV'd endpoint as well?
The only reason I can see is that publishers can encode the import scripts in their templates and don't need to update it with every release. However, as self-hosting is only supported for transformed AMP, this should be done in the transformer anyway.
@sebastianbenz The RTV paths are still necessary because dynamically loaded components use the RTV paths when version locking is applied.
I don't think there's any reason to stipulate transformation by amp-toolbox's optimizer for self-hosting to be allowed. If one of the goals of this I2I is "independence", then a publisher ought to be able to simply swap out cdn.ampproject.org with runtime.example.org in their AMP documents, add a couple of meta tags, and the runtime does the rest. None of this requires any extra work by the runtime than the changes we've discussed to read the meta tags. I don't even think the validator requires additional work to support rolling release self-hosting URLs. This all matches the AMP project so closely that support is essentially the same.
The self-hosting experience seems a bit bumpy, especially if the developer was expecting a self-hosted "hello, world" to involve unzip and a text editor, or minor search-and-replace surgery of an existing AMP:
I assume it's a non-goal that self-hosting is easy, perhaps make this explicit?
@ithinkihaveacat I think that's a fair assessment. Version locking is pretty much the reason for the complex JS hosting requirements. I think a simpler hello world example could be possible without version locking, but I'm not sure anyone is onboard with that idea. I largely don't care anymore as long as we can get some self hosting support out the door in the near future. Long term, sure, I would like to see the AMP project loosen it's grip on how folks host the runtime.
amphtml PR: https://github.com/ampproject/amphtml/pull/26829 amp-toolbox PR: https://github.com/ampproject/amp-toolbox/pull/622 Self-hosted runtime sample project: https://github.com/mdmower/amp-self-host-demo/
Re the amp-geo
API - when you write the docs it's probably worth pointing out that there are accuracy/performance cache tradeoff considerations on the endpoint. Longer cache window means faster 2nd+ page loads but increases the risk of a wrong country. amp-geo
uses 30 min if I recall correctly.
While it might benefit from some reorganization, I've been preparing a "hosting howto": https://gist.github.com/mdmower/b56e94f0dc36beafb825b0c5e31fceef . Is this something that should be included in #26829 or should it have its own PR where it can be independently scrutinized?
@mdmower that guide is amazing! Thanks a lot for starting this. Please create a different PR for this and assign @CrystalOnScript as a reviewer.
PR for AMP framework hosting guide: https://github.com/ampproject/amphtml/pull/27100 (in draft status until #26829 is approved)
One of the open questions at the end of the original design doc is "Do we want to ship a zip containing all AMP runtime assets with every AMP release, e.g. via NPM?" I would like to see this happen but I don't think there's any way I can contribute (pretty sure only Googlers with access to internal release code/scripts can work on this). Does this task need a new Issue created?
New issue would be great! Thanks!
Can we allow for cache-busting query parameters to be added to the versioned scripts? This would make it much easier to juggle writing the scripts to the filesystem and not having to worry about keeping old versions around. In other words, instead of:
<script async src="https://example.com/amp/rtv/012002290616360/v0.js"></script>
Could we support:
<script async src="https://example.com/amp/v0.js?rtv=012002290616360"></script>
For the WordPress plugin, we'd probably be writing the scripts to the filesystem in a directory like wp-content/amp-scripts
, so that would look like:
<script async src="https://example.com/wp-content/amp-scripts/v0.js?rtv=012002290616360"></script>
And whenever our scheduled tasks runs to update the AMP scripts, we'd just have to update that one wp-content/amp-scripts/v0.js
file as opposed to adding a new wp-content/amp-scripts/rtv/012002290616360
, writing v0.js
into it, and then deciding which previous rtv directory can be safely deleted based on page cache times.
We can, but it requires two changes:
?rtv=
param: https://github.com/ampproject/amphtml/blob/0a4f28343650bb2d0d9627652df52b2de4526c4c/src/service/extension-script.js#L96-L110?rtv=
: https://github.com/ampproject/amphtml/blob/0a4f28343650bb2d0d9627652df52b2de4526c4c/src/service/extension-script.js#L52-L63
Summary
Make it possible for publishers to self-host the AMP runtime, without breaking AMP validity.
Design document
Motivation
The benefits are:
Launch tracker
/cc @ampproject/wg-approvers @mattwomple @honeybadgerdontcare