Consider providing source location of reason in the API

walnut07 commented 8 months ago

Currently, according to the approved API spec (1), the API provides a string that explains the reason that prevented the document from bfcache. It should also provide the JavaScript source location of a reason to help developers further understand why the page was not served from bfcache.

Potential information to expose as a source location is: url, function name, line number, and column number. The source location itself should be optional because some reasons do not involve source location (e.g. network related reasons such as kHTTPStatusNotOK).

Questions that I can come up with are:

Whether the field for a function name should be optional or not. A line number, column number, and url should never be null as long as there's a source location, but there can be no function name provided when the reason is used in an anonymous function. Should it be an empty string or null?
Whether it should provide a file name (foo.js) or an url (https://a.com/foo.js). Since there can be multiple files that have the same name, I think it is okay to provide a url while it can be pretty long.

(1): https://github.com/whatwg/html/pull/9360/files#diff-41cf6794ba4200b839c53531555f0f3998df4cbb01a4d5cb0b94e3ca5e23947dR94947

rubberyuzu commented 8 months ago

The current list of blocking reasons is masked, navigation-failure, fetch, parser-aborted, weblock and websocket. We can start by capturing source locations for fetch, weblock and websocket and add the source location to the reason struct. We are also working on adding may-block reasons (user agents specific blocking reasons) to the spec, and user agents can add source locations to those reasons as well.

@domenic @smaug---- Please let me know what you think, thanks!

domenic commented 8 months ago

I think this is a good idea. The API design for source location requires some thought. Aligning with any precedent would be good.

The main precedent I know about is ErrorEvent, which is very old. It provides filename (actually a URL), lineno, and colno (which both are not great examples of modern naming practices).

It appears recently another web performance API introduced sourceURL, sourceFunctionName, sourceCharPosition. Maybe that is the best one to follow. /cc @noamr in case he has any advice.

walnut07 commented 8 months ago

Thank you domenic@. I also came up with another question: how to deal with reasons used in the web console. Looking at ErrorEvent, it doesn't seem to mention that case. Should we spec how to deal with them or is it going out of scope?

domenic commented 8 months ago

I don't understand the question, sorry.

noamr commented 8 months ago

Would this mean reading the last stack frame every time you invoke a fetch, opening a websocket or starting a weblock? That might have too much overhead. For fetch in LoAF, we only get the source URL for this reason. In either case I think the API for this in long animation frames is similar enough to copy. All the fields there are optional as gathering this information is a performance trade-off.

rubberyuzu commented 8 months ago

Would this mean reading the last stack frame every time you invoke a fetch, opening a websocket or starting a weblock? That might have too much overhead. For fetch in LoAF, we only get the source URL for this reason. In either case I think the API for this in long animation frames is similar enough to copy. All the fields there are optional as gathering this information is a performance trade-off.

Yes, we need to read the stack every time the blocking feature is used, and this could cause a performance regression. Probably we could consider making the fields optional as well?

smaug---- commented 8 months ago

What would optional mean in this context?

And yes, capturing the stack every time fetch() or weblock is triggered might be quite overkill for performance.

noamr commented 8 months ago

What would optional mean in this context?

If I understand the question, in the long animation frames, the source location props are nullable, and the spec language says that "The UA MAY do this and that and assign the value". It gives an idea about what the value represents, while gives the UA some freedom.

rubberyuzu commented 7 months ago

@smaug---- Can I ask your position on adding an optional source location field?

smaug---- commented 7 months ago

Hmm, so what does LoAF actually expose? @noamr you say only source URL, but the APIs have fields for more.

@rubberyuzu have you done any performance measurements on this stuff?

noamr commented 7 months ago

Hmm, so what does LoAF actually expose? @noamr you say only source URL, but the APIs have fields for more.

What LoAF exposes depends on the invoker type:

For (classic/module) script blocks, this would be the URL of the script, or the URL of the page + element character index for inline scripts.
For event listeners and user-callbacks (e.g. setTimeout callback), it would be the URL+function name+character index.
For promise resolve/reject, it would be the last known URL when the promise was created, e.g. when fetch is invoked. This is the closest resemblence to the not-restored-reason use case.

btw all of this specified pretty comprehensively in https://w3c.github.io/long-animation-frames.

rubberyuzu commented 7 months ago

@rubberyuzu have you done any performance measurements on this stuff? @smaug----

Yes, we are running experiments for this, but surprisingly we did not see any significant performance regression.

smaug---- commented 7 months ago

I wouldn't expect experiments to find the possible performance issues. I was thinking actual performance profiling.

rubberyuzu commented 6 months ago

@smaug---- I ran Chrome with/without capturing the details, and tried registering an unload handler 10000 times and measured how long it took. With capturing the details, it took 0.418 sec on average. Without capturing the details, it took 0.406 sec on average.

In the real world there will be much less blocking, and the performance regression will be pretty trivial for Chrome.

You might want to test the same thing for FireFox too though.

noamr commented 6 months ago

From our experience with long animation frames, those calls are usually not in a middle of the JS stack, so not repeating many many times during JS execution, so checking the source location when calling them doesn't scale to be a big issue.

rubberyuzu commented 5 months ago

@smaug---- @petervanderbeken Can you please tell me what you think about this? Thanks!

smaug---- commented 5 months ago

Looks performance might be good enough if done like what LoAF does.

We can start by capturing source locations for fetch, weblock and websocket and add the source location to the reason struct.

Capture when? When the object is created or when some operation which may block bfcache was initiated?

What happens if there are multiple say WebSockets blocking bfcache, is the idea that the creation location of them all is reported?

rubberyuzu commented 5 months ago

The plan is to capture locations when some operation that could block bfcache was initiated. When multiple instances may block bfcache, the idea is to have a limit (say 5), and capture up to 5 locations.

WICG / bfcache-not-restored-reason

Consider providing source location of reason in the API #15