Open martincerven opened 2 months ago
@Patrick-Erichsen indexing? Can you provide more info? It seems that Chromium was downloaded with mere extension update...really?
Hey @martincerven , appreciate the feedback. This is for the documentation service. We just added a note here about why this is needed: https://github.com/continuedev/continue/blob/dev/docs/docs/features/talk-to-your-docs.md#how-it-works
Docs crawling happens entirely on a users local machine, so to handle sites with Javascript enabled we decided to pull down Chromium on install. Without this the majority of docs sites can't be crawled.
Our aim with this is to be more privacy preserving by allowing users to perform indexing locally rather than through our own servers, but curious to know if this is still behavior you'd prefer to disable.
This is terrifying.
An extension should never just silently download and execute some binary files from the internet.
And definitely not without getting permission from the user first. That is a very sneaky behavior, and it opens up the question, whether the code does anything else unexpected/unwanted.
This is for the documentation service. We just added a note here about why this is needed: https://github.com/continuedev/continue/blob/dev/docs/docs/features/talk-to-your-docs.md#how-it-works
As of now, the documentation page still does not list the information about chromium download.
There is no information about the origin of the chromium binary (who built it?).
On a NixOS machine with working "chromium" (and "chrome") accessible in PATH, the extension (JetBrains variant) silently downloaded chromium from somewhere, executed it, and it failed with:
Error: Failed to launch the browser process!
/home/<username>/.continue/.utils/.chromium-browser-snapshots/chromium/linux-1350578/chrome-linux/chrome: error while loading shared libraries: libglib-2.0.so.0: cannot open shared object file: No such file or directory
From the sources it looks like it uses binaries built by Google, and it looks like the download at least uses "https" (no idea if there is any verification of signatures or at least checksums).
Given the sneaky nature of silent installation, it would make sense to question/verify whether the installed extension is actually clean build of the source from github (without any malicious changes). Does it download clean or backdoored chromium binary?
(It looks like the contents of continue-binary
file in the installed JetBrains extension matches github code at least at configuring PCR_CONFIG
- it configures only downloadPath
(no hosts
set), and import_puppeteer_chromium_resolver/require_lib13
falls back to https://storage.googleapis.com
. Of course that is not a guarantee that there are not any malicious changes in the code further down.)
As there is a funded company behind this plugin (and not just a pseudonymous developer as was the case in xz utils), it is likely not developed as a backdoor distribution mechanism, but the "silently download binary from internet and execute it" behavior looks terrifyingly close to one.
Docs crawling happens entirely on a users local machine, so to handle sites with Javascript enabled we decided to pull down Chromium on install. Without this the majority of docs sites can't be crawled.
It is possible that some sites cannot be crawled without a chromium browser. It is impossible, that the majority of sites cannot be crawled without the extension downloading chromium browser.
Need chrome or chromium browser? Fine.
If it is possible to use normal installation of a browser, just check whether it is installed, and if not, ask the user to install it.
If a specific version of chromium is really required, then download it only after the user added something like "allowChromiumDownload": true,
to the config file. If the line is not there, it might be good idea to explain what is going on, and present the user with URL of required chromium binary. Allow them to download it manually and save it in a specific directory as a fallback. That might also be useful for indexing internal documentation in an air-gapped network.
Our aim with this is to be more privacy preserving by allowing users to perform indexing locally rather than through our own servers, but curious to know if this is still behavior you'd prefer to disable.
Using local chrome/chromium could be reasonable idea (e.g. can index internal documentation sites, etc...) - assuming it does not use the user's actual chromium profile, chromium sandboxing is enabled, and the browser is kept updated.
Yeah, it's very similar to xz and also Crowdstrike where they pushed update to prod and it crashed 10% of windows machines.
Here it was also just update , it's very contrasting with for example Llama.cpp where they want to reimplement functionality to be not dependend even on other FOSS libraries.
So for me questions are:
For me, the point of using open source extension is that anything can be checked by community, sneakily downloading some random binary from god knows where runs directly in opposition to this.
@Patrick-Erichsen can you comment on these points?
Right now this just seems that instead of Chromium.app, you can also download Malware.app without any user consent, or anything really, which is very dangerous precedent, more so for free and open source vscode extension.
Oh, thank you @martincerven for bringing that up!
It's also very concerning for disk space savy individuals, 541M is accounted for /home/huge/.continue/.utils/.chromium-browser-snapshots
which would be like 5 % of my workspace backup.
@martincerven : could you please tidy up the OP a bit? Maybe adding what commit or which version was the last safe one. Edit: This went in most likely with this, which happened 2 weeks ago. I'll try to look further to check whether the extension version 8.5 is clean of this...
Edit: removing it from CLI did not break the basic functionality for me, so I'd advise savy users to do that for now.
Skipping the paranoia (which everyone should have), it would be nice if users had the option of managing the chromium installation themselves and simply adding a config with the path to it. This would also let users update (or pin?) their chromium binaries, and possibly using a custom-compiled chromium (or ungoogled-chromium?).
@Patrick-Erichsen it's a not an acceptable implementation. User privacy and choices in open source products is not an option or a feature. It's basics
I'd consider this feature only If all following points will be implemented:
Small guidance on avoiding the bloating util for now: Download continue-linux-arm64-0.9.197.vsix or continue-linux-arm64-0.8.46.vsix from GH release page and install it manually:
Props to @sestinj to at least advertise clearly the headless browser is to be used, in the v8.47 release notes.
To everyone arguing about explicit opt-in, this is the same level/type of dependency as everything from the continue package.json, I doubt that you really mean that all of the dependencies have to be opt-in.
It's puzzling to see security/privacy concerns too, as the installation above happens in an extension which was already allowed to do everything it needs on the user's machine, so any malicious intents already had a chance to have been executed.
With that, it's a completely reasonable ask to allow configuring the type of crawling that is performed (plain/rich), try reusing already installed browser(s) and optimise downloads to use lighter Chromium versions when the download is necessary, or use VS Code's Web Views. I'm sure maintainers will get there once this feature will have enough use. It's not completely reasonable, however, to see such an acute backslash, as all of the concerns (third-party code execution, disk usage bloat) are pretty much a given when installing this or any other kind of extensions for VS Code.
@av some dependencies can be opt-in as an external pre-installed application is used.
Some dependencies like chromium are ok if you want to do something fast or the only browser you know is chromium based. Also using an existing browser instead of a browser from a dependency provide some important cookies and more control from a user.
Also a preinstalled browser is usually managed in companies, which would require a way more settings than author envisioned for this project and more hassle for a user to set them all.
Also, they should take care of executing the browser inside a sandbox environment and make sure it is updated to the most stable version. There are many exploits out there in the wild.
I'm not a hacker guys (I'm just a peaceful animal), but I as I'm using Ubuntu, I was a little bit concerned about somethings:
If you want to run developer builds of Chromium/Chrome on Ubuntu 23.10+ (or possibly other Linux distros in the future), you'll need to either globally or selectively disable an Ubuntu security feature.
But if you do this, they say:
For a while, user namespaces have been available to unprivileged (e.g. non-root) users on most Linux distros, but they exposed a lot of extra kernel attack surface.
In a report from Google, 44% of the exploits they saw required unprivileged user namespaces as part of their exploit chain.
I prefer to not turn off Ubuntu security feature, so I won't use this for now. Forgive me if I said anything wrong, I just tried to help!
Thanks to everyone who shared their feedback in this thread. We heard you loud and clear and have taken steps to address this both immediately and in the future.
As a principle, we will not dynamically download executables without user visibility. PR #2192 makes the change so that we fall in line with this principle for Chromium (it is entirely opt-in):
useChromiumForDocsCrawling
in your config.json in order to define the behaviorThese updates are now available in VS Code pre-release v0.9.207, will be released later today in a Jetbrains EAP, and as soon as these pre-releases have undergone the same initial testing we do each time, they will become main releases
There were also a few points in this thread worth addressing:
Why can't we use the Chromium that is already installed for Google Chrome or otherwise? Puppeteer, the package used to control the headless browser, requires a specific chromium_revision for each version of the library, so we can’t easily allow users to manage the download/installation, or use existing installations
It's not in the docs!! We've added the reference here where we believe it is most likely to be found by folks using the docs feature: https://docs.continue.dev/features/talk-to-your-docs#crawling-dynamically-generated-sites-with-usechromiumfordocscrawling
Do we actually need a headless browser? We've been consistently testing against a large list of very common docs sites, many directly requested by users, to check whether we can successfully crawl them. We'd tried a pretty exhaustive list of non-headless browser tools before coming to the conclusion that one is necessary to get even passable success rates. If anyone proves this wrong, we're open to hearing solutions.
Hopefully it is understood by now that Continue takes great effort to secure your code, to the point of operating as a local-first application. In considering the trade-offs between hosting our own web crawling servers, to which the extension would have to send requests, vs. following the local-first pattern, we took this lens, but more than anything we value feedback. So again, thanks all for being swift to call us out, and thanks @Patrick-Erichsen for being just as swift in taking the necessary action.
I'll hold off on closing the issue for a minute so as not to be discouraging of further discussion!
@sestinj thank you for step out and answer our questions. My concern is still there about mandatory settings and addons, which managed by a company for all browsers.
Additionally, it has no managed settings by a user (including cookies) and plugins such as to block ads and/or improve privacy of any kind. I don't like an idea to be tracked via an application.
The other way around the issue would be provide a separate downloader program which would download nesesery raw data including the output format application uses. The latter is for a possibility to create an alternative downloader applications if anybody of us would be willing to address.
Thanks @sestinj and @Patrick-Erichsen for quick action, I was being downvoted to hell for bringing this up, but I felt it was a security issue, although I couldn't put my finger on exactly what irked me.
Now, I know there are few security points, some independent of continue:
Lastly,
Hopefully it is understood by now that Continue takes great effort to secure your code, to the point of operating as a local-first application. In considering the trade-offs between hosting our own web crawling servers, to which the extension would have to send requests, vs. following the local-first pattern, we took this lens, but more than anything we value feedback.
I'm very happy you took local-first approach even when we voice our concerns here. I honestly doesn't see inside how this crawling works, but last imaginary scenario:
Anyway, thanks @sestinj for addressing this issue, it will ultimately make your product better and more secure.
Martin
https://discussions.apple.com/thread/8582300?sortBy=rank to remove the notification, rm -rf .continue, stop using continue extension, report this app until they actually improve
Appreciate the further thoughts here! We've thought about this pretty deeply, trying to take into account all of the feedback received and where we want to go with the product. Without committing to a particular direction, we are tentatively looking into building out an indexing server.
Though things are much better with the headless browser being entirely opt-in, I still wanted to give an update so you know we haven't simply forgotten about this : )
I will make sure to update here as soon as we have more info!
Using electron was not enough, now every extension of every electron app will install its own chromium! Now I have 3 additional chromiums in my system, thanks!
- Why can't we use the Chromium that is already installed for Google Chrome or otherwise? Puppeteer, the package used to control the headless browser, requires a specific chromium_revision for each version of the library, so we can’t easily allow users to manage the download/installation, or use existing installations
I am pretty sure that is not true, or at least it was the other way a few months ago when I worked with it.
You just need to set PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true to install it without chromium and then you can run any chromium binary if you have access to it, some features may be less compatible in different versions, but the core functionality remains the same.
@remixer-dec thanks for sharing that screenshot, I believe we gave that a try and ran into issues with Puppeteer complaining about an incompatible Chromium revision though. Will plan to try it out again though when we circle back to some work we have planned around docs service in the near future 👍
Before submitting your bug report
Relevant environment info
Description
There is Chromium.app in ~/.continue/.utils/.chromium-browser-snapshots/chromium/ installed without any user consent at all.
To reproduce
No response
Log output
No response