Why does this extension need full blown Chromium.app?

martincerven commented 2 months ago

Before submitting your bug report

[ ] I believe this is a bug. I'll try to join the Continue Discord for questions
[ ] I'm not able to find an open issue that reports the same bug
[ ] I've seen the troubleshooting guide on the Continue Docs

Relevant environment info

- OS:
- Continue:
- IDE:
- Model:
- config.json:

Description

There is Chromium.app in ~/.continue/.utils/.chromium-browser-snapshots/chromium/ installed without any user consent at all.

To reproduce

No response

Log output

No response

martincerven commented 2 months ago

@Patrick-Erichsen indexing? Can you provide more info? It seems that Chromium was downloaded with mere extension update...really?

Patrick-Erichsen commented 2 months ago

Hey @martincerven , appreciate the feedback. This is for the documentation service. We just added a note here about why this is needed: https://github.com/continuedev/continue/blob/dev/docs/docs/features/talk-to-your-docs.md#how-it-works

Docs crawling happens entirely on a users local machine, so to handle sites with Javascript enabled we decided to pull down Chromium on install. Without this the majority of docs sites can't be crawled.

Our aim with this is to be more privacy preserving by allowing users to perform indexing locally rather than through our own servers, but curious to know if this is still behavior you'd prefer to disable.

otopetrik commented 1 month ago

This is terrifying.

An extension should never just silently download and execute some binary files from the internet.

And definitely not without getting permission from the user first. That is a very sneaky behavior, and it opens up the question, whether the code does anything else unexpected/unwanted.

This is for the documentation service. We just added a note here about why this is needed: https://github.com/continuedev/continue/blob/dev/docs/docs/features/talk-to-your-docs.md#how-it-works

As of now, the documentation page still does not list the information about chromium download.

There is no information about the origin of the chromium binary (who built it?).

On a NixOS machine with working "chromium" (and "chrome") accessible in PATH, the extension (JetBrains variant) silently downloaded chromium from somewhere, executed it, and it failed with:

Error: Failed to launch the browser process!
/home/<username>/.continue/.utils/.chromium-browser-snapshots/chromium/linux-1350578/chrome-linux/chrome: error while loading shared libraries: libglib-2.0.so.0: cannot open shared object file: No such file or directory

From the sources it looks like it uses binaries built by Google, and it looks like the download at least uses "https" (no idea if there is any verification of signatures or at least checksums).

Given the sneaky nature of silent installation, it would make sense to question/verify whether the installed extension is actually clean build of the source from github (without any malicious changes). Does it download clean or backdoored chromium binary?

(It looks like the contents of continue-binary file in the installed JetBrains extension matches github code at least at configuring PCR_CONFIG - it configures only downloadPath (no hosts set), and import_puppeteer_chromium_resolver/require_lib13 falls back to https://storage.googleapis.com. Of course that is not a guarantee that there are not any malicious changes in the code further down.)

As there is a funded company behind this plugin (and not just a pseudonymous developer as was the case in xz utils), it is likely not developed as a backdoor distribution mechanism, but the "silently download binary from internet and execute it" behavior looks terrifyingly close to one.

Docs crawling happens entirely on a users local machine, so to handle sites with Javascript enabled we decided to pull down Chromium on install. Without this the majority of docs sites can't be crawled.

It is possible that some sites cannot be crawled without a chromium browser. It is impossible, that the majority of sites cannot be crawled without the extension downloading chromium browser.

Need chrome or chromium browser? Fine. If it is possible to use normal installation of a browser, just check whether it is installed, and if not, ask the user to install it. If a specific version of chromium is really required, then download it only after the user added something like "allowChromiumDownload": true, to the config file. If the line is not there, it might be good idea to explain what is going on, and present the user with URL of required chromium binary. Allow them to download it manually and save it in a specific directory as a fallback. That might also be useful for indexing internal documentation in an air-gapped network.

Our aim with this is to be more privacy preserving by allowing users to perform indexing locally rather than through our own servers, but curious to know if this is still behavior you'd prefer to disable.

Using local chrome/chromium could be reasonable idea (e.g. can index internal documentation sites, etc...) - assuming it does not use the user's actual chromium profile, chromium sandboxing is enabled, and the browser is kept updated.

martincerven commented 1 month ago

Yeah, it's very similar to xz and also Crowdstrike where they pushed update to prod and it crashed 10% of windows machines.

Here it was also just update , it's very contrasting with for example Llama.cpp where they want to reimplement functionality to be not dependend even on other FOSS libraries.

So for me questions are:

is it really needed?
where does the Chromium comes from? Is it build from source, by Google? Downloaded by npm?
can you use user's browser installation?
how can you prevent that some other malicious code is not run on the Chromium? I actually didn't know this was possible at all

For me, the point of using open source extension is that anything can be checked by community, sneakily downloading some random binary from god knows where runs directly in opposition to this.

@Patrick-Erichsen can you comment on these points?

Right now this just seems that instead of Chromium.app, you can also download Malware.app without any user consent, or anything really, which is very dangerous precedent, more so for free and open source vscode extension.

Huge commented 1 month ago

Oh, thank you @martincerven for bringing that up! It's also very concerning for disk space savy individuals, 541M is accounted for /home/huge/.continue/.utils/.chromium-browser-snapshots which would be like 5 % of my workspace backup.

@martincerven : could you please tidy up the OP a bit? Maybe adding what commit or which version was the last safe one. Edit: This went in most likely with this, which happened 2 weeks ago. I'll try to look further to check whether the extension version 8.5 is clean of this...

Edit: removing it from CLI did not break the basic functionality for me, so I'd advise savy users to do that for now.

KMouratidis commented 1 month ago

Skipping the paranoia (which everyone should have), it would be nice if users had the option of managing the chromium installation themselves and simply adding a config with the path to it. This would also let users update (or pin?) their chromium binaries, and possibly using a custom-compiled chromium (or ungoogled-chromium?).

eirnym commented 1 month ago

@Patrick-Erichsen it's a not an acceptable implementation. User privacy and choices in open source products is not an option or a feature. It's basics

I'd consider this feature only If all following points will be implemented:

This would be an explicit opt-in feature
Only user would be responsible to download and install engine of some kind
Only user would be responsible of URLs accessed by the tool
Consider an option to use non-js documentation fetching, so no browser is used.
User will be given a choice which browser to use. There's plenty of them.
Please also remember about Firefox-only users. This is a fully capable browser to download required data

Huge commented 1 month ago

Small guidance on avoiding the bloating util for now: Download continue-linux-arm64-0.9.197.vsix or continue-linux-arm64-0.8.46.vsix from GH release page and install it manually:

Props to @sestinj to at least advertise clearly the headless browser is to be used, in the v8.47 release notes.

av commented 1 month ago

To everyone arguing about explicit opt-in, this is the same level/type of dependency as everything from the continue package.json, I doubt that you really mean that all of the dependencies have to be opt-in.

It's puzzling to see security/privacy concerns too, as the installation above happens in an extension which was already allowed to do everything it needs on the user's machine, so any malicious intents already had a chance to have been executed.

With that, it's a completely reasonable ask to allow configuring the type of crawling that is performed (plain/rich), try reusing already installed browser(s) and optimise downloads to use lighter Chromium versions when the download is necessary, or use VS Code's Web Views. I'm sure maintainers will get there once this feature will have enough use. It's not completely reasonable, however, to see such an acute backslash, as all of the concerns (third-party code execution, disk usage bloat) are pretty much a given when installing this or any other kind of extensions for VS Code.

eirnym commented 1 month ago

@av some dependencies can be opt-in as an external pre-installed application is used.

Some dependencies like chromium are ok if you want to do something fast or the only browser you know is chromium based. Also using an existing browser instead of a browser from a dependency provide some important cookies and more control from a user.

Also a preinstalled browser is usually managed in companies, which would require a way more settings than author envisioned for this project and more hassle for a user to set them all.

animaldomestico commented 1 month ago

Also, they should take care of executing the browser inside a sandbox environment and make sure it is updated to the most stable version. There are many exploits out there in the wild.

I'm not a hacker guys (I'm just a peaceful animal), but I as I'm using Ubuntu, I was a little bit concerned about somethings:

If you want to run developer builds of Chromium/Chrome on Ubuntu 23.10+ (or possibly other Linux distros in the future), you'll need to either globally or selectively disable an Ubuntu security feature.

But if you do this, they say:

For a while, user namespaces have been available to unprivileged (e.g. non-root) users on most Linux distros, but they exposed a lot of extra kernel attack surface.

One explanation found here:

In a report from Google, 44% of the exploits they saw required unprivileged user namespaces as part of their exploit chain.

I prefer to not turn off Ubuntu security feature, so I won't use this for now. Forgive me if I said anything wrong, I just tried to help!

sestinj commented 1 month ago

Thanks to everyone who shared their feedback in this thread. We heard you loud and clear and have taken steps to address this both immediately and in the future.

As a principle, we will not dynamically download executables without user visibility. PR #2192 makes the change so that we fall in line with this principle for Chromium (it is entirely opt-in):

If we can successfully index the site requested without a headless browser, we will try this first
If it fails without a headless browser, then we will ask for user permission to try with Chromium
At any given time, you can set useChromiumForDocsCrawling in your config.json in order to define the behavior

These updates are now available in VS Code pre-release v0.9.207, will be released later today in a Jetbrains EAP, and as soon as these pre-releases have undergone the same initial testing we do each time, they will become main releases

There were also a few points in this thread worth addressing:

Why can't we use the Chromium that is already installed for Google Chrome or otherwise? Puppeteer, the package used to control the headless browser, requires a specific chromium_revision for each version of the library, so we can’t easily allow users to manage the download/installation, or use existing installations
It's not in the docs!! We've added the reference here where we believe it is most likely to be found by folks using the docs feature: https://docs.continue.dev/features/talk-to-your-docs#crawling-dynamically-generated-sites-with-usechromiumfordocscrawling
Do we actually need a headless browser? We've been consistently testing against a large list of very common docs sites, many directly requested by users, to check whether we can successfully crawl them. We'd tried a pretty exhaustive list of non-headless browser tools before coming to the conclusion that one is necessary to get even passable success rates. If anyone proves this wrong, we're open to hearing solutions.

Hopefully it is understood by now that Continue takes great effort to secure your code, to the point of operating as a local-first application. In considering the trade-offs between hosting our own web crawling servers, to which the extension would have to send requests, vs. following the local-first pattern, we took this lens, but more than anything we value feedback. So again, thanks all for being swift to call us out, and thanks @Patrick-Erichsen for being just as swift in taking the necessary action.

I'll hold off on closing the issue for a minute so as not to be discouraging of further discussion!

eirnym commented 1 month ago

@sestinj thank you for step out and answer our questions. My concern is still there about mandatory settings and addons, which managed by a company for all browsers.

Additionally, it has no managed settings by a user (including cookies) and plugins such as to block ads and/or improve privacy of any kind. I don't like an idea to be tracked via an application.

The other way around the issue would be provide a separate downloader program which would download nesesery raw data including the output format application uses. The latter is for a possibility to create an alternative downloader applications if anybody of us would be willing to address.

martincerven commented 1 month ago

Thanks @sestinj and @Patrick-Erichsen for quick action, I was being downvoted to hell for bringing this up, but I felt it was a security issue, although I couldn't put my finger on exactly what irked me.

Now, I know there are few security points, some independent of continue:

vscode doen't have good permission system for extensions
is the puppeteer run with sandbox option?
in github library you use there is args: ["--no-sandbox"] isn't that a huge security no no? 🚩 even at puppeteer repo they say it's [huge security risk**](https://pptr.dev/troubleshooting/#setting-up-chrome-linux-sandbox)
are you sure actually downloading headless-chromium? On macos I got whole Chromium.app which is I'd say different from headless? Again there is flag for that headless: false
are you sure you are downloading chromium binary from trusted sources? The only mention of source is here. I don't know how would you even check that' but I wouldn't put trust of my company on the line based on some random 40 star resolver library (that's harsh, I know that author probably doesn't mean bad, but still, he could change url and bang you're downloading malicious browsers to millions of your customers)

Lastly,

Hopefully it is understood by now that Continue takes great effort to secure your code, to the point of operating as a local-first application. In considering the trade-offs between hosting our own web crawling servers, to which the extension would have to send requests, vs. following the local-first pattern, we took this lens, but more than anything we value feedback.

I'm very happy you took local-first approach even when we voice our concerns here. I honestly doesn't see inside how this crawling works, but last imaginary scenario:

If someone works on proprietary code for new rocket at SpaceX, woudn't it crawl proprietary docs and private repositories which would be then sent (If he's using ClosedAI or ChineseAI) to LLM provider as part of context for prompt? Of course this falls on user for not configuring the settings, but still...

Anyway, thanks @sestinj for addressing this issue, it will ultimately make your product better and more secure.

Martin

itpofy2024o commented 1 month ago

https://discussions.apple.com/thread/8582300?sortBy=rank to remove the notification, rm -rf .continue, stop using continue extension, report this app until they actually improve

sestinj commented 1 month ago

Appreciate the further thoughts here! We've thought about this pretty deeply, trying to take into account all of the feedback received and where we want to go with the product. Without committing to a particular direction, we are tentatively looking into building out an indexing server.

Though things are much better with the headless browser being entirely opt-in, I still wanted to give an update so you know we haven't simply forgotten about this : )

I will make sure to update here as soon as we have more info!

remixer-dec commented 3 weeks ago

Using electron was not enough, now every extension of every electron app will install its own chromium! Now I have 3 additional chromiums in my system, thanks!

remixer-dec commented 3 weeks ago

Why can't we use the Chromium that is already installed for Google Chrome or otherwise? Puppeteer, the package used to control the headless browser, requires a specific chromium_revision for each version of the library, so we can’t easily allow users to manage the download/installation, or use existing installations

I am pretty sure that is not true, or at least it was the other way a few months ago when I worked with it.

You just need to set PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true to install it without chromium and then you can run any chromium binary if you have access to it, some features may be less compatible in different versions, but the core functionality remains the same.

Patrick-Erichsen commented 2 weeks ago

@remixer-dec thanks for sharing that screenshot, I believe we gave that a try and ran into issues with Puppeteer complaining about an incompatible Chromium revision though. Will plan to try it out again though when we circle back to some work we have planned around docs service in the near future 👍

continuedev / continue