WICG / pwa-url-handler

Other
72 stars 15 forks source link

Integrate with Declarative Link Capturing #11

Open LuHuangMSFT opened 3 years ago

LuHuangMSFT commented 3 years ago

Starting an issue to discuss integrating with the Declarative Link Capturing proposal. I will add more thoughts and references below.

LuHuangMSFT commented 3 years ago

cc @mgiuca I e-mailed you some ideas a couple of days ago, e-mail subject "Declarative Link Capturing + URL Handling". I'll replicate them here for discussion.

LuHuangMSFT commented 3 years ago

At a high level, we find ourselves agreeing that features which facilitate an in-app experience should first try to accomplish their goals using the app's navigation scope instead of creating a similar scoping mechanism. However, we think that link capturing could have difficulty gaining adoption if developers are unable to configure the set of URLs affected by the new feature. We would like to propose an adaptation of existing proposals.

There already seems to be a pattern in PWA development today where the scope is set to the root but various pages are excluded from the app experience using workarounds such as placing them under a different sub-domain, or using _target="blank" in links.

Examples to consider:

These examples can be addressed by allowing URLs to be excluded from link capturing.

Additionally, we would like to:

We think both examples (1. and 2.) can be addressed by link capturing exceptions. The default set of URLs included for link capturing is identical to the navigation scope, addressing (A). URLs within the default set can be excluded, and URLs outside the default set can be included, by modifying a separate association file. The use of an optional host field in url_handlers entries addresses (B). The optional capture_links defines the link capturing option for all include URLs. The link capture option for excluded URLs is none.

Below is an example manifest demonstrating usage.

{
    "name": "Contoso App",
    "start_url": "/?standalone",
    "display": "standalone",
    "icons": […],
    "capture_links": "existing_client_event",
    "url_handlers": [{
        "association_file": "/web-app-site-association.json"
    }, {
        "host": "conto.so",
        "association_file": "conto.so/.well-known/web-app-site-association.json"
    }, ]
}

Example manifest format

In the manifest, url_handlers objects contain the optional host and location of the site association file. A url_handlers object without a host value can:

Structuring URL exceptions in the association file allows the site to control usage of URLs and also prevents duplication in the web app manifest. Additionally, for PWAs featured in app stores, this allows link capturing changes to be made without deploying an updated manifest to the store. An app/site handshake is not necessary for validating the association for in-scope URLs but using an association file for URL exceptions enables this deployment benefit.

It is unclear to me whether there should be only one capture_links option for the entire app or whether there is a benefit to allowing a choice for every url_handlers object. I am looking for a good example for the latter.

We also found an argument for not controlling link capturing behavior by modifying the navigation scope: if a URL that should be excluded from link capturing behavior is excluded from the navigation scope and is navigated to from within the app context without opening in a new window, it will render in a pseudo-browser frame. This behavior is correct because the URL is outside of the navigation scope, but users will find it difficult understand why a navigation in the same context now shows a pseudo-browser frame. There might be other similar side-effects.

Finally, I reused terminology above like url_handlers for continuity but we're eager to hear good alternatives.

mgiuca commented 3 years ago

Hi Lu,

There already seems to be a pattern in PWA development today where the scope is set to the root but various pages are excluded from the app experience using workarounds such as placing them under a different sub-domain, or using _target="blank" in links.

Yeah, I think we're seeing a lot of pain with not being able to exclude sub-paths from an app scope. I totally agree that sites shouldn't have to restructure their URL hierarchy in order to get the right scoping.

I'm still coming at this from a mindset of: this shouldn't be fixed just for link capturing. You should be able to exclude sub-paths from the actual app scope. Introducing sub-path exclusion as part of the link capturing feature feels like solving too specific of a problem. However, on the other hand, it does seem possible that sites would want to have sub-paths that are part of their app scope, but excluded from link capturing.

I'm considering this Pinterest example: if you were building a PWA with sub-paths like "About" and "Blog", you wouldn't want those to link capture I agree. But would you want them to still be part of your app scope if the user navigated into them whilst using the app? I suppose it would be best to not show the CCT UI when navigating into those pages, which means it would be useful to exclude paths from link capturing but keep them as part of the app scope.

A url_handlers object without a host value can:

  • Only contain sub-paths to the navigation scope and is only allowed to control in-scope URLs.
  • Only exclude URLs from the navigation scope but cannot include any additional URLs.

That sounds good. Then we're starting to separate the concept of cross-origin link capturing and in-scope customization of the URL scope. What is your intended syntax for excluding URLs (and how consistent is it with @wanderview's Service Worker Scope Pattern Matching proposal?) I don't think it is strictly necessary to be consistent with the service worker scoping, but it would be ideal to have one syntax for excluding sub-paths and not two.

Thanks for writing this up. I like the direction this is going.

mgiuca commented 3 years ago

On the original topic of integrating the two proposals: I don't think they need to be bundled together into one thing (i.e. land into the Manifest spec as a single PR). They can both exist independently.

I would prefer to keep them separate, since they are both quite complex on their own and the best way to deal with complexity is to break it into manageable components.

LuHuangMSFT commented 3 years ago

Syntax for excluding URLs

I think it is possible to use a syntax in the style of the manifest format stated in Service Worker Scope Pattern Matching (SWSPM) but the format does not match exactly. With a few modifications, it would not look too dissimilar from what was described in this explainer.

The manifest format for the value of "scopePattern" from SWSPM was given in this example:

{
    "baseUrl": "https://foo.com/",
    "path": "/app/?*"
}

Some observations:

  1. "path" is a string value, not an array of strings. This was intentionally designed to describe a single scope. Whether we follow this for link capturing is a design choice: allowing only a single path makes it difficult to include two paths like "/a/foo/*" and "/a/bar/*" without having to add everything else under "/a/*" to exclude_paths. (This was called out in the Stakeholder Feedback section.)
    • SWSPM states that multiple scopes could be defined with lists of scope entries.
    • We could use one "url_handlers" object for each path but that makes readability worse and objects would not be grouped by "baseUrl".
  2. It does not have an "exclude_paths". We could extend the format with "exclude_paths".
  3. To exclude in-scope URLs for link capturing, there is no need for "baseUrl" or "path/s", just "exclude_paths" in the object, in the association file. To include URLs for cross-origin link capturing, we need "baseUrl" and "web_app_association_file" in the object in the manifest, "path/s", and possibly "exclude_paths" in object in the association file.
  4. Having both "paths" and "exclude_paths" could lead to fewer string comparisons in some cases.
  5. I don't think "search" is necessary for now, but would not be difficult to add later.
  6. The wildcard syntax and rules can be used directly with the additional of a leading wildcard rule for "baseUrl" (to support "clientname.productname.com" URLs. That is also something that can be added later.

Another format we studied was the apple-app-site-association file. It is able to match for paths, fragments, queries, and allows developers to order include and exclude paths by priority. This is a powerful declarative format but we are concerned about about matching performance and being able map it to OS URL handling formats.

LuHuangMSFT commented 3 years ago

Following the example above:

{
    "name": "Contoso App",
    "start_url": "/?standalone",
    "display": "standalone",
    "icons": […],
    "capture_links": "existing_client_event",
    "url_handlers": [{
        "association_file": "/web-app-site-association.json"
    }, {
        "host": "conto.so",
        "association_file": "conto.so/.well-known/web-app-site-association.json"
    }, ]
}

"/web-app-site-association.json" would contain:

{
    "apps": [{
        "manifest": "/manifest.json",
        "paths": ["/*"],
        "exlude_paths": ["/blog/*"]
    }]
}

"conto.so/.well-known/web-app-site-association.json" would contain:

{
    "apps": [{
        "manifest": "contoso.com/manifest.json",
        "paths": ["/*"],
        "exclude_paths": ["/blog/*"]
    }]
}
LuHuangMSFT commented 3 years ago
  • If URL Handling lands first without DLC, it would state the set of paths to capture, and define a basic model for opening a standalone window when links are captured. Later, DLC can land and extend it to say "here is how you customize the browser's behaviour when link capturing is activated."
  • If Declarative Link Capturing lands first without this, it would simply state that link capturing applies to all URLs within scope of the manifest. Later, URL Handling can land and extend that to say "that's the default, but if you specify the exact URL Handling paths, it applies to those instead".

That sounds good to me. URL Handling can also refer to DLC on when to capture:

This proposal defines new manifest members that control what happens when the browser is asked to navigate to a URL that is within the application’s navigation scope, from a context outside of the navigation scope. It doesn’t apply if the user is already within the navigation scope (for instance, if the user has a browser tab open that is within scope, and clicks an internal link). The user agent is also allowed to decide under what conditions this does not apply; ...

LuHuangMSFT commented 3 years ago

Some other thoughts

I'm still coming at this from a mindset of: this shouldn't be fixed just for link capturing. You should be able to exclude sub-paths from the actual app scope. Introducing sub-path exclusion as part of the link capturing feature feels like solving too specific of a problem. However, on the other hand, it does seem possible that sites would want to have sub-paths that are part of their app scope, but excluded from link capturing.

I think excluding sub-paths from the actual app scope is useful in its own right, which SWSPM could address. Each manifest member applies to some set of URLs (oftentimes any URL, in-scope or not), and it's easy to reason about what URLs a member should be applied to, but not easy to reason about what URLs should be within the app scope. Scope restricts navigations within the app context with manifest continuing to be applied. By this definition, the manifest's members don't necessarily have to apply to in-scope URLs, they just cannot be applied to out-of-scope URLs. As you pointed out in this PR, the latter is not strictly true either.

In my interpretation, the "set of URLs that are considered to be part of an app" definition provides a convenient default set of URLs for manifest members to apply to but they don't necessarily have to apply to all in-scope URLs either. It also does not prevent out-of-scope URLs from being affected by a member (eg. link captured).

it does seem possible that sites would want to have sub-paths that are part of their app scope, but excluded from link capturing.

It's difficult to reason about this without knowing what being part of their app scope means in practical terms. Some of the earlier members apply to the app context itself (display, theme_color, etc) and can continue to be applied no matter where the context navigates. Setting some in-scope URLs to link capture but not others is more like setting one URL to be "start_url".

I like the definition at web.dev.

The scope defines the set of URLs that the browser considers to be within your app, and is used to decide when the user has left the app.

It is useful to let the app developer draw the boundary of what is within the app but it may not be the right boundary for every browser and manifest feature that depends on it.

mgiuca commented 3 years ago

I didn't realise you were proposing that the site association file be used even for same-origin link capturing. I'm not sure why we'd do that when you could just put the data inside the manifest file itself. Or are you suggesting that either is valid? (I suppose there's no reason to specifically say you can't host a site association file on the same origin, though our documentation shouldn't encourage it.)

Otherwise, the syntax seems reasonable, but I kinda wish we would have the SWSPM syntax in the Manifest for scope before / at the same time as the above lands. Otherwise, it feels weird to have the advanced syntax only for link capturing and not for scopes. It also puts us at risk, if say the TAG review on SWSPM encourages them to change their syntax slightly, then they become inconsistent with URL handling syntax.

LuHuangMSFT commented 3 years ago

I was proposing above that the scope exclusion patterns be placed only inside the association file, not either. Placing it inside the manifest file only is an option.

If scope exclusions for link capturing are placed inside the manifest only: Pros

Cons

Perhaps this syntax can be called out by SWSPM to be be reviewed also. That would help prevent inconsistency. Having the advanced syntax for link capturing and not scope temporarily doesn't seem too strange to me. The manifest scopePattern from SWSPM is not significantly different from scope yet unless it changes to allow multiple path patterns and exclude patterns.

mgiuca commented 3 years ago

I guess I would consider the associations file for in-scope to be part of the app's behaviour, and therefore it makes sense that it be part of the manifest itself. If listing an app in a store requires signing the manifest, and re-signing it whenever the manifest changes, it would seem strange to me that the link capturing URLs be excluded from that signing process.

LuHuangMSFT commented 3 years ago

What do you think of the following?

Scenario A: URL Handling lands first

Scenario B: Both Declarative Link Capture and URL Handling

Scenario C: Declarative Link Capture lands first

Forward Compatibility

Algorithm of how DLC and URL Handling filters can both be applied

Where association files are found and validation requirements

Examples

https://contoso.com/manifest.json

{
    "name": "Contoso Business App",
    "display": "standalone",
    "icons": [
        {
            "src": "images/icons-144.png",
            "type": "image/png",
            "sizes": "144x144"
        }
    ],
    "capture_links": "existing_client_event",
    "capture_links_exclude_paths": [
        "/about",
        "/blog"
    ], 
    "app_links": [
        "contoso.com",
        "conto.so",
        "*.contoso.com"
    ]
}

https://partnerapp.com/manifest.json

{
    "name": "Partnera PP",
    "display": "standalone",
    "icons": [
        {
            "src": "images/icons-144.png",
            "type": "image/png",
            "sizes": "144x144"
        }
    ],
    "capture_links": "existing_client_event",
    "capture_links_exclude_paths": ["/only/for/partnerapp/*"],
    "app_links": [
        "contoso.com",
        "partner.contoso.com"
    ]
}

https://contoso.com/web-app-site-association.json or https://conto.so/web-apps-site-association.json

[
    {
        "manifest": "https://contoso.com/manifest.json",
        "handle_urls": {
            "paths": [
                "/*"
            ],
            "exclude_paths": [
                "/blog",
                "/about"
            ]
        }
    },
    {
        "manifest": "https://partnerapp.com/manifest.json",
        "handle_urls": {
            "paths": [
                "/public/data/*"
            ]
        }
    }
]

https://partner.contoso.com/web-app-site-association.json


[
    {
        "manifest": "https://contoso.com/manifest.json",
        "handle_urls": {
            "paths": [
                "/*"
            ],
            "exclude_paths": [
                "/only/for/partnerapp/*"
            ]
        }
    },
    {
        "manifest": "https://partnerapp.com/manifest.json",
        "handle_urls": {
            "paths": [
                "/*"
            ]
        }
    }
]
LuHuangMSFT commented 3 years ago

@mgiuca Is the above closer to what you had in mind?

Change summary:

mgiuca commented 3 years ago

Hi Lu,

Apologies for the lateness of this reply.

Regarding the above three scenarios:

Scenario B sounds good to me, which is good, because that's the final place where we want to end up regardless of which lands.

I feel like Scenarios A and C don't quite capture the orthogonality of the two APIs:

The last thing that is concerning is this paragraph:

"This association file is a recommended format for validation but browsers are also free to use other non-web-standard formats like ones used by Android, iOS, and Windows."

That feels like a recipe for creating sites that work on one OS but not others. In terms of the standard, I would like us to only have the standard format. (There's nothing we can do to prevent implementations from also recognising non-standard formats, but the standard shouldn't explicitly allow it.) In terms of the Chromium implementation, I would like us to only recognise the standard format. (Of course, we can use non-standard formats to capture URLs into native apps, but when we're capturing URLs into web apps, we should force sites to present that in the standard format.)

LuHuangMSFT commented 3 years ago

Scenario C (DLC only): "Has a manifest member exclude_paths for exclusion patterns." I haven't got an exclude_paths in my DLC proposal, and I was considering the path exclusion to be more a part of the URL handler proposal ("what is captured") as opposed to the DLC ("how it is treated, once captured").

I was concerned that DLC wouldn't have a way to selectively exclude URLs from link capturing and that would prevent DLC from being adopted, but I am starting to understand what you mean by orthogonality: it keeps each from limiting the other. I'll modify this to keep the exclusion of in-scope URLs to the URL Handling spec alone. I.e. it'll only be available in scenario A and B.

LuHuangMSFT commented 3 years ago

Scenario A (URL handling only): "The capture behavior will be similar to existing_client_event except the event handler does not determine whether there is a match, just how to handle a launch. The handler will run in a new app window in the start_url document. No further navigation takes place if no handler is found." I'm not quite sure what this is saying. Does it always open a new app window (hence it's similar to new_client), or do you actually intend to fire the launch event? Even though that event is eventually designed to be part of the DLC proposal, I am hesitant to bundle it up with URL handling, since then that could constrain the design of DLC later on. Again, URL handling should be "what is captured" as opposed to DLC, "how it is treated, once captured". Do you have a specific need to reuse an existing window, and that's why this is being bundled in URL handling? If so, perhaps that's just a good reason to expedite DLC. (Which we are planning to tackle in Q4, for what it's worth.)

I think what I meant was URL Handling option 1 below:

existing_client_event behavior   DLC URL Handling option 1 URL Handling option 2
No existing window open opens a new window, navigates to given URL, does not fire launch event in new window opens a new window, loads start_url document, fires launch event opens a new window, loads start_url document, fires launch event
Existing window available fires launch event in one exisiting window opens a new window, loads start_url document, fires launch event fires launch event in one existing window
Event handler not present Only matters if existing window present. Since there is no handler, there will not no observable change to the user. opens a new window to start_url. no existing window: opens a new window to start_url. existing window: no observable change.
LuHuangMSFT commented 3 years ago

"This association file is a recommended format for validation but browsers are also free to use other non-web-standard formats like ones used by Android, iOS, and Windows."

That feels like a recipe for creating sites that work on one OS but not others. In terms of the standard, I would like us to only have the standard format. (There's nothing we can do to prevent implementations from also recognising non-standard formats, but the standard shouldn't explicitly allow it.) In terms of the Chromium implementation, I would like us to only recognise the standard format. (Of course, we can use non-standard formats to capture URLs into native apps, but when we're capturing URLs into web apps, we should force sites to present that in the standard format.)

Makes sense to me. I like that.

LuHuangMSFT commented 3 years ago

Closed by mistake.