Open RobertJGabriel opened 3 years ago
Thank you, don't look like there is any workaround. Will have to build an actual Google Workspace addon.
Maybe it is only for /preview
, not for /edit
? I mean, see URL.
For /preview
it makes sense, because it is only preview and visitor shouldn't have ability to change its content, even through HTML. It is hard to change content in external canvas.
For me, at the moment, /edit
page uses HTML editor, not canvas editor.
But yes, if there will be canvas-rendering instead of HTML-rendering, then it will be a problem.
How you created this preview? Can you provide steps?
The original post from Google can be found here: https://workspaceupdates.googleblog.com/2021/05/Google-Docs-Canvas-Based-Rendering-Update.html
Do they support any kind of accessibility API with the new design?
An discussion can be found here: https://news.ycombinator.com/item?id=27129858
Google have updated their post and opened a small possibility.
If you open "Accessibility Settings" --> "Turn on Screen reader support", Google Docs will emit Readable HTML with the actual text. Only problem is, this means a complete re-write of the core Google Docs Util code, due to the new HTML structure is different.
If possible the Google Docs Util code should:
Thank you @JensPLarsen
Do they support any kind of accessibility API with the new design?
I suppose no. For external JS, which didn't create the canvas, it is very hard to interact with 2D context of canvas (I mean, CRUD operations with canvas content). For example, Yandex.Disk Word editor uses canvas based rendering and for me it wasn't possible to somehow interact with document content.
will emit Readable HTML with the actual text.
The problem here is that this provides only ability to read document content. But this library need to have all CRUD operations in order to provide all already implemented functionality. Sure, I will check possibility to interact with document through that "small possibility", but highly unlikely that it will provide all needed things to support this project.
So, I think this project will die when Google Docs will release the canvas based rendering feature. Unfortunately, at the moment it doesn't look like there is anything that can be done about it
The problem here is that this provides only ability to read document content. But this library need to have all CRUD operations in order to provide all already implemented functionality. Sure, I will check possibility to interact with document through that "small possibility", but highly unlikely that it will provide all needed things to support this project.
So, I think this project will die when Google Docs will release the canvas based rendering feature. Unfortunately, at the moment it doesn't look like there is anything that can be done about it
I agree, if anything it would most likely result in a new project which contains a subset of what this can.
And I fear a new project may have the same issue when Google Docs changes to use WebAssembly (or something else) and everything changes again in X years.
Are there any alternatives to this library that work with with the canvas based rendering, or are there plans to update the library?
Are there any alternatives to this library?
No, as I'm aware.
Are there plans to update the library?
No, at the moment.
Darn that sucks.
How is Grammarly doing it then? https://chrome.google.com/webstore/detail/grammarly-for-chrome/kbfnbcaeplbcioakkpcpgfkobkghlhen?hl=en
Google provided temporary support for such extensions. If the extension needs to interact with a document through DOM, then the extension can force Google Docs to use HTML-based rendering instead of canvas-based rendering.
It is controller via _docs_force_html_by_ext
variable:
In that case Google Docs will use HTML instead of canvas.
_docs_force_html_by_ext
is undefined:
_docs_force_html_by_ext
is set:
However, only whitelisted extensions can use this _docs_force_html_by_ext
. Most likely Google Docs team will contact with extension developer to notify him about this feature (as they did this to me).
But anyway, this feature is just a temporary workaround to give developers some time to adapt their extensions. This feature will be disabled soon, maybe in 2021, so it is not reliable.
After that we will see which extensions are able to interact with Google Docs through canvas.
According to my above answer. If you want to use this library, you should install extension which enables HTML-based rendering instead of canvas-based rendering: Grammarly, Smart Copy, etc.
I see, I'll try and contact Google to get whitelisted, although having to install a second extension just to use mine wouldn't be very practical for users.
You don't actually need to be whitelisted or install any other extensions. You can force html rendering by adding ?mode=html
to the query parameters.
Confirm :+1: Although Google clearly specifies that HTML fallback option has been deprecated and will slowly be removed from production.
Thanks! @Amaimersion Does Google mention any specific date? Where do they mention it will be removed from production?
They mention it through email. Emails are send to those who is subscribed to https://sites.google.com/corp/google.com/docs-canvas-migration/home
They planning to remove it completely to the end of February.
They planning to remove it completely to the end of February.
Sad times.
Sad indeed : /
Wait a minute: @Amaimersion Do you know how did Grammarly make it work on canvas?
They are not using the whitelisting anymore, if you inspect the DOM when Grammarly is enabled you can see it works on canvas. I have tried forcing ?mode=html and it also works. Which means Grammarly somehow managed to make it work to read the text from canvas. Now the question is, how?
I have just downloaded the source code from the Grammarly extension and I found some interesting stuff there. For instance, there is getText function https://gist.github.com/gzomer/2b809174ce380fced61040005a9a9576#file-grammarly-gdocscanvasinjectedcs-js-L1060
They have a file named Grammarly-gDocsInjectedCs.js
which seems to be for the DOM version. But now they have a new file named Grammarly-gDocsCanvasInjectedCs.js
(see link above).
I have used this extension to get Grammarly source code https://chrome.google.com/webstore/detail/chrome-extension-source-v/jifpbeccnghkjeaalbbjmodiffmgedin?hl=en
@gzomer this is interesting, did you happen to get it to work of the grammarly extension?
That code is a bit complicated but if we can put a breakpoint in that getText function it will be clear
I was able to partially get the full text. On the onRender function here you can just call n.getText({})
and it will return the full text. You can also get a full document structure by inspecting the variable o
.
However, there is one downside. I couldn't make it work without Grammarly extension enabled. There is a sort of a connection between docs and Grammarly const t = document.documentElement.dataset.grGdcConnId || (document.documentElement.dataset.grGdcConnId
which seems to happen in another file, but I could understand how does it work.
So it seems to be possible, we just need to figure out how.
I have pushed the whole source code here: https://github.com/gzomer/grammarly-extension
So far the ones that seem to be relevant are: https://github.com/gzomer/grammarly-extension/blob/main/src/js/Grammarly-gDocsEarlyInjectedCs.js https://github.com/gzomer/grammarly-extension/blob/main/src/js/Grammarly-gDocsCanvasInjectedCs.js https://github.com/gzomer/grammarly-extension/blob/main/src/js/Grammarly-gDocs.js
I think this function gets the relevant elements for extracting text.
Then this function decodes it into usable text.
Neither of those steps looks trivial.
@Omegastick I use Chrome DevTools to put a breakpoint in that content script function ce(e)
. That function recursively searches the properties of e
to look for the document's text. The question then is where e
comes from.
It turns out e
is the global variable window.KX_kixApp
. If you open Google Docs and press F12, then type into the console window.KX_kixApp
you will see that variable.
That variable isn't accessible from the content script's context. I'm not sure how they are able to access it from their content script. The only way I know how to do something like that is by adding a script tag that will execute in the page's JavaScript context, JSON.stringify that variable and pass it to the content script via postMessage. But maybe they're doing some other way.
Edit: ah, got it. The bulk of their scripts executes in the page's JS context. The content script is Grammarly-gDocsEarlyInjector.js
, which creates the script tag to inject their scripts into the page's context. I'll see if I can make a proof of concept.
Edit: and the statement on line 943 is how they search for the text. le(n, ((e,t)=>t && "" === t.toString().charAt(0)), 5)
means look for string properties up to 5 levels of depth that begins with that special unicode character.
This function is self-contained:
function le(e, t, n, o=Object.getOwnPropertyNames(e)) {
const r = new Set
, i = [];
let s = 0;
const a = (o,l,c,u=0)=>{
if (s++,
"prototype" === o || l instanceof Window)
return;
if (u > n)
return;
const d = [...c, o];
try {
if (t(o, l))
return void i.push({
path: d,
value: l
})
} catch (e) {}
var g;
if (null != l && !r.has(l))
if (r.add(l),
Array.isArray(l))
l.forEach(((e,t)=>{
try {
a(t.toString(), e, d, u + 1)
} catch (e) {}
}
));
else if (l instanceof Object) {
((g = l) && null !== g && 1 === g.nodeType && "string" == typeof g.nodeName ? Object.getOwnPropertyNames(e).filter((e=>!J.has(e))) : Object.getOwnPropertyNames(l)).forEach((e=>{
try {
a(e, l[e], d, u + 1)
} catch (e) {}
}
))
}
}
;
return o.forEach((t=>{
try {
a(t, e[t], [])
} catch (e) {}
}
)),
{
results: i,
iterations: s
}
}
Calling it like this will return the text of the document:
le(window.KX_kixApp, ((e,t)=>t && "\x03" === t.toString().charAt(0)), 5)
Edit: and a de-obfuscated version https://gist.github.com/ken107/2b40c87fcdf27171a5a5fdc489639300
Very nice @ken107 ! The secret is on window.KX_kixApp
which contains the document structure. This function you posted basic traverse that structure to get the texts. Very nice!
Thanks! It looks like the guys at Grammarly did some serious reverse-engineering work. Not only are they getting the text of the document, but they also figure out where each word is rendered on the screen, underline them, and show contextual hints. I wonder if they are privy to any knowledge of the KIX specs, if such a thing exists. The fact that they're brute-forcing the DOM implies they don't have any special arrangement with Google Docs. Also, I wonder how long we can expect this workaround to last.
You don't actually need to be whitelisted or install any other extensions. You can force html rendering by adding
?mode=html
to the query parameters.
thanks , this method works fine until today haha , mine is working today , I don` t know why ! is yours still working by putting parameter "?mode=html" to the URL ? thanks a lot !
?mode=html
workaround has stopped working for me too. Anyone else can confirm?
The trick with mode=html worked because of the ass-hands of google programmers. Now they have corrected the error. Now, to enable the html renderer, in addition to specifying the GET parameter, you also need to find the kix-awcp parameter in the html code and replace it from false to true
Only the great Сthulhu knows why they need the GET parameter now if they added the key to the kix parameters.
Thanks @demimurych ! What do you mean by change the kix-awcp param? I have tried _docs_flag_initialData['kix-awcp'] = true
However as this seems to be read only during initialization, it doesn't have any effect. How did you manage to make it work?
Thanks for the solution, @demimurych. I tried it locally and it works for me.
The trick is it to execute the code
if (window._docs_flag_initialData) {
window._docs_flag_initialData['kix-awcp'] = true;
}
However, this code should be executed not from the content script itself (the content script does not have access to the appropriate window
instance, since it's executed in the "sandboxed" space). Instead, the content script should dynamically load another JS file. This file will be executed in the context of the original window
and you will be able to change the configuration value.
Of course, content script, which initiates the loading, should be executed at document_start
. Also, you will need to whitelist dynamically loaded JS file in the web_accessible_resources
section of the manifest.json in order to allow hosting page to load it. And ?mode=html
should be still present in the URL.
Minor update. Code snippet should look like:
!function forceHtmlRenderingMode() {
if (window._docs_flag_initialData) {
window._docs_flag_initialData['kix-awcp'] = true;
} else {
setTimeout(forceHtmlRenderingMode, 0);
}
}();
Sometimes the dynamic script is loaded and executed before window._docs_flag_initialData
is initialized. Updated code snippet fixes it.
And again, it proves that the hands of Google programmers grow out of their asses. So...
Now there are three ways to enable html rendering. At least I found three:
window._docs_force_html_by_ext == true ? 'html render' : 'canvas render'
(GET['mode']=='html' && window._docs_flag_initialData['kix-awcp'] == true) ? 'html' : 'canvas'
But, programmers at Google forgot that the window object also contains references to elements that have an id attribute set. For example:
HTML:
<span id="spanId"></id>
JS:
window['spanId'] == true;
That is, if we can add an id
attribute to any element on the page, the value of which will be equal to _docs_force_html_by_ext
, then we will automatically receive an identifier in the window object with !=false value. From which it follows that html rendering will turn on, since the conditions of method 1 are met.
For example
<body id="_docs_force_html_by_ext ">
As far as I know, any browser plugin can run before the main code and add an id attribute to any element. Hint: link element can have an id attribute to. I think you get the hint.
Thanks so much you guys come up with so many solutions , but for some of us don` t know how to use these code and even where to put them , it is quite troublesome ... can some one please just show us how to use these code , much much appreciation !!
Sorry. I can't help you here. I have no time. There is a war going on in my country. I have to protect my home. Try to figure out how to write a plugin for Google Chrome. There is nothing complicated. Or perhaps someone else can advise you.
Thanks for the workarounds. Here's how I implemented them in my extension: https://github.com/ken107/read-aloud/commit/e360fb325409155da4aabd35575c2a91d5e09b68
The first script is a content script loaded via declaration in manifest.json. The first script loads the second script via a SCRIPT tag injection. The second script is the one that modifies the window properties.
I had a little FireFox WebExtension that set window._docs_force_html_by_ext = true (and I set it to run_at:document_start in the manifest) and it was working very well...Google Docs were being rendered as HTML. This is good because if I look at a Google Doc in FF when it has been rendered using Canvas the text in the document looks blurry and hard to read.
Alas this was working up until yesterday, and now even though nothing changed in my extension I see even setting _docs_force_html_by_ext doesn't seem to help. I am up for working with anyone to try and solve this problem to get Docs to render as HTML.
Google has added a whitelist check for extensions that are allowed to enable html rendering. I am looking for a solution. It is very easy to enable html rendering manually. Difficulties with automatic inclusion.
Now you need to set the window._docs_force_html_by_ext = to one of the following values: ['pebbhcjfokadbgbnlmogdkkaahmamnap', 'lbfjopbdnlacmcdochehdolkcipncehm', 'bknnlbamapndemiekhkcnmdclnkijlhb', 'mchfohhlgkjmomgcblaebjldcdcfddod', 'ahpgjaondafacdmkhdkpfndbblpafgjo', 'hakgmeclhiipohohmoghhmbjlicdnbbb', 'hopjidpebkocjhmmhkjmgblipnonklin', 'obamedcdehgbcknllmpfmkjboadmcngk', 'kkokhmpamjfkaobkhofabjmflebofofm', 'piobbnjelpnbnafleaibbfbnnmibnpjh', 'plaeniloeifmajgdcaonhdnolpfjfhdg', 'pbnaomcgbfiofkfobmlhmdobjchjkphi']
Wow @demimurych you're a genius! That worked. Thank you very much! I appreciate it. Google docs are useable again. But can I ask...how did you arrive at your solution? When I look in a Docs page HTML source I don't see any mention of _docs_force_html_by_ext, or any of those string values you listed.
How did you find that list of whitelisted extension IDs? (I'm trying to improve my problem solving detective skills.)
@ifnullzero message me on telegram. I will try to help you. https://t.me/demimurych @demimurych
@demimurych do you have a github sponsor account set up. Would love to throw some money your way.
So it seems that setting window._docs_force_html_by_ext no longer seems to trigger HTML rendering. Does anyone else on here have any additional ideas about how to trigger Docs to render as HTML?
Summarization of above. At the moment this code enables HTML rendering mode. This solution works only on Google Chrome.
manifest.json
{
"manifest_version": 2,
"name": "Google Docs DOM",
"version": "1.0.0",
"content_scripts": [
{
"matches": ["*://docs.google.com/*"],
"all_frames": false,
"run_at": "document_start",
"js": ["content-script.js"]
}
],
"web_accessible_resources": [
"injected-script.js"
]
}
content-script.js
/**
* @see
* https://github.com/Amaimersion/google-docs-utils/issues/10#issuecomment-1086602191
*/
function injectCode() {
const code = `(function() {window['_docs_force_html_by_ext'] = 'pebbhcjfokadbgbnlmogdkkaahmamnap';})();`;
const script = document.createElement('script');
script.textContent = code;
(document.head || document.documentElement).appendChild(script);
}
function injectScript() {
const script = document.createElement('script');
script.type = 'text/javascript';
script.src = chrome.runtime.getURL('injected-script.js');
(document.head || document.documentElement).appendChild(script);
}
injectCode();
injectScript();
injected-script.js
/**
* @see
* https://github.com/Amaimersion/google-docs-utils/issues/10#issuecomment-1033118583
*/
if (!location.href.includes('mode=html')) {
if (location.href.includes('?')) {
location.href = location.href.replace('?', '?mode=html&');
} else if (location.href.includes('#')) {
location.href = location.href.replace('#', '?mode=html#');
} else {
location.href += '?mode=html';
}
}
/**
* @see
* https://github.com/Amaimersion/google-docs-utils/issues/10#issuecomment-1059671773
* https://github.com/Amaimersion/google-docs-utils/issues/10#issuecomment-1059671773
* https://github.com/Amaimersion/google-docs-utils/issues/10#issuecomment-1062588430
*/
function forceHTMLRenderingMode(n) {
if (window._docs_flag_initialData) {
window._docs_flag_initialData['kix-awcp'] = true;
} else if (n > 0) {
window.setTimeout(forceHTMLRenderingMode.bind(null, n - 1), 0);
} else {
console.warn('Could not set kix-awcp flag');
}
}
forceHTMLRenderingMode(100);
@Amaimersion, I just tried your code and it does not appear to do anything -- the Google Docs page is still rendering as Canvas and not HTML. What am I missing here? Does it work for you (or anyone else)?
Just bring it up as and issue and will be willing to help on any develop to get it ready.
Here is the canvas based example https://docs.google.com/document/d/1N1XaAI4ZlCUHNWJBXJUBFjxSTlsD5XctCz6LB3Calcg/preview
@menicosia @ken107 @bboydflo @Amaimersion @JensPLarsen