Closed fregante closed 4 years ago
Wow this file length is insane! I think it's best to not execute OctoLinker at all if a file goes beyond a certain line limit. Refactroing the extension to support such edgaces seems to be overkill and I would rather spend my time with new features.
Defining the limit is probaly the trickest task here. My gut feeling says that everyone beyone 5000 lines should be ignored.
How does this sound to you?
That sounds alright, but it might be a good opportunity to find some bottlenecks, like these long garbage collection events
It appears to be caused by this (which contains getAggregateText
): https://github.com/OctoLinker/OctoLinker/blob/0174f49ed52e936dadcacc81f8cdf428d86bc950/packages/helper-insert-link/index.js#L2
I'm sure that there's a more efficient way to replace dom elements, e.g. looping getTextNodes
Also from what I understand, the whole document is re-parsed for each regex
I too regularily observer the browser tab hanging when opening moderately big package.json
files. For example https://github.com/elastic/kibana/blob/master/package.json freezes my browser tab for almost 20 seconds, which I think is borderline unacceptable. A max line count won't help here as the density of links is just too high in such files.
If CPU-intensive work is to be done, maybe offload it to a web worker? Thought based on the comments above, the bottleneck sounds like inefficient parsing/replacing.
freezes my browser tab for almost 20 seconds, which I think is
borderlineunacceptable.
20-seconds freezes aren't just borderline unacceptable
Wow, that file is only 440 lines but freezes the browser for longer than TypeScript's 18K lines file
So this issue isn't new for both of you and existed also in the previous version, right? Anyway, sorry for this bad experience.
@bfred-it I agree, there are more efficeient ways to replace dom elements, but maintaining this extension for many years, taught me that relying on classNames and dom elements is error-prone and time consuming to keep in sync. findandreplacedomtext
was solving this issue for me pretty well.
OctoLinker does not apply all RegExps over and over again on the same document. First, all blobs on a docuemnt are paresed and stored in an unified format. Then based on the filepath and/or language information one or more plugins are applied to parse the blob. Usally it's only one apart from a few expections. A plugin can define one or more RegEx where each RegEx is applied on the blob, not the whole page. The JavaScript plugin contains three RegExp https://github.com/OctoLinker/OctoLinker/blob/master/packages/plugin-javascript/index.js#L108
Maybe this helps to track down the issue. I'll look into this, but feel free to add your ideas / explorations as well.
Yes, this is not a new issue, it's been that way since I'm using the extension.
getAggregateText
is here. It looks to extract a array of strings from the DOM. I think this is a operation that only needs to run once per page, maybe some memoization might help.
Does the operation need to be synchronous or can it pause every 200ms so the browser can go through the event loop?
In Refined GitHub we had the same issue but and it was resolved by adding an await setTimeout
in the loop if the loop takes too long.
That's a great suggestions, thanks. Might be slightly tricky to implement since all the dom wrapping is sync. Refactoring that to async is actually on my todo list since quite a while. I'll look into this later this week and maybe there is a shortcut.
Yesterday, I found some time and it's really a tricky one. I've noticed that Chrome is struggling with DOM parsing too, but "just" for ~3 seconds. Anyway, as you spotted the issues is related to the findandreplacedomtext
dependency. I need more time to work on a solution. Do you notice this issue most of the times "just" for package.json files or also for regular files (expect huge files with 20k). That would be super useful to know to find a good solution.
I might have found a solution which reduces the execution time from ~25 seconds to 250ms on Firefox and from 2 seconds to 250ms on Chrome. It needs more testing to verify the results, but I'm optimistic that this will be fixed soon
Quick update: Today I started to continue on this issue. During the summer break I had other stuff to do π However, the original approach seems to break a few other things so I decided to rewrite this part from scratch. I'll keep you posted.
Another few months passed by without much progress on this, sorry for that. However, in the past couple of days I made good progress and I would like to share an early version with you. I haven't done any intensive testing, but it seems to work fine. In this version linking of code snippets in MD files and comments is removed.
octolinker-5.2.2-an+fx.xpi
Enjoy and please share your feedback with me. Thank you
Source is available here and PR will follow soon
The new parsing is on average 81% faster on Firefox and 28% on Chrome. For big files like https://github.com/Microsoft/TypeScript/blob/master/lib/lib.dom.d.ts and https://github.com/elastic/kibana/blob/master/package.json Chrome is round about 86% faster and Firefox finally does not crash and finish parsing after a reasonable time of ~250ms π
Great gift for the 6th anniversary of OctoLinker π
Great to hear! Iβll be testing it in January if you donβt merge it first
Indeed, that beta seems much faster in Firefox. I'll be testing it.
Know issues:
I found a few more issues with linking in diff views. Fixing those requires some more refactoring, but I'm optimistic. Stay tuned!
What's the latest beta version I can try?
Do you mind using this branch and building the extension yourself using npm run firefox-open
β although this isn't using your default Firefox installation. I need to rebase this branch with master since I added some parts as a separate PR already.
I just tested that on the TypeScript file and found no difference, it makes the yellow bar appear for 8-9 seconds just like the Store version.
Testing method:
about:debugging
(beta)
Firefox 72 (Refined GitHub disabled)
I was primarily using this file for testing which seems to be more realistic in terms of length https://github.com/elastic/kibana/blob/master/package.json however it should not freeze your browser. Maybe it's suitable to not run OctoLinker on such long files at all. What do you think?
Maybe it's suitable to not run OctoLinker on such long files at all.
You could limit the usefulness of OctoLinker if the only alternative is a many-seconds lockup, but that's always the last option.
Imports are usually defined at the top of a file so might be tacking just the first 100 lines into account on such big files could be a workaround.
Strange, I can't confirm that Firefox hangs when using the source from this dev branch. I tried both npm run firefox-open
and about:debugging
, but both works π
However, I think cap OctoLinker to take only the first ~200 lines for files with more than 10.000 lines is legit restriction.
*number needs to be defined
All remaining issues are fixed and I'm super happy to share a Pull Request with you https://github.com/OctoLinker/OctoLinker/pull/774. Unit and E2E tests are passing so I'm quite confident it works as intended.
@fregante @silverwind in case you want to give it a try your self, please do so. Thanks
Finally fixed π Will be part of the upcoming version which is scheduled for early next week.
Just in time for Chinese New Year π π
Nice work and explanation. I think it's a case of time complexity reduction from O(n log n)
to O(n)
so it will scale much better for big datasets.
I consistently get this message when visiting this large file:
https://github.com/Microsoft/TypeScript/blob/master/lib/lib.dom.d.ts
Admittedly this is more of an edge case, given the length of the file, but freezing the browser for a few seconds is never good.
Perhaps the operation can be done in the background and/or split up in multiple batches to avoid long blocking.