[GHSA-45pg-36p6-83v9] Langchain SQL Injection vulnerability

liadlevy commented 1 week ago

Updates

CVSS v3
CVSS v4
Severity

Comments I am the security researcher that found the vulnerability. NVD CVSS score is of 9.8 CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H This vulnerability is easy to exploit, exists in the default configuration, and results in full database exposure. The affected database is commonly used in scenarios where this functionality's interface exposes user input to the package. + credit would be nice ;) Thanks.

darakian commented 1 week ago

Hey there. It looks like the cvss 3.1 score we have comes from the huntr.dev report which I'm inclined to believe over the nvd score. Reading from the huntr thread it seems like langchain does document that access control is the responsibility of the api user. Can you expand on why you disagree?

Ref: https://huntr.com/bounties/8f4ad910-7fdc-4089-8f0a-b5df5f32e7c5

liadlevy commented 1 week ago

As noted in the Huntr thread, I reported the issue with the same severity that NVD ultimately assigned. I had an extensive discussion there about it, and they acknowledged that the vulnerability has a significant impact.

LangChain includes a dangerous functionality opt-in called "allow_dangerous_requests", but it is implemented in numerous chains but was not consistently across all text2query chains/tools. After my disclosure, they mitigated these vulnerabilities and added "allow_dangerous_requests" to the affected chains/tools.

For example, one of the chains mitigated because of my report is referenced here: LangChain GitHub https://github.com/langchain-ai/langchain/blame/abaea28417adb63dae0cfad3c60dff5297e3ce0d/libs/community/langchain_community/chains/graph_qa/neptune_sparql.py#L116 .

Following the "Secure By Design" and "Secure By Default" principles outlined in CISA’s guidelines, all code should be secure by default. As I further highlighted in the Huntr thread, neither Neo4j nor LangChain's official blogs or documentation reference the security implications of this vulnerability. The only mentions are in general guidelines or as a source code comment.

From a technical perspective, the CVSS score assigned to this vulnerability is incorrect. This is not a local vulnerability but a network vulnerability. An attacker does not need internal access to exploit it, and the impact on integrity and availability is High, not Low, as they indicated. The vulnerability enables full database compromise and could even impact multi-tenant setups. Moreover, the attack complexity is extremely low—just a single line of text is enough to exploit it.

Any SQL injection can be mitigated by placing a least privileged user and also this won't remediate this vulnerability fully as the user can still READ unintended database data in a raw way. After all, I think this is a classic showcase how a vendor "decides" severity score based on business objective and not pure technical perspective as the CVSS score they reported is false no matter if it had or didn't have a security note (and if a security note). Apache could also claim in Log4shell not to log "user input" and if so, it is the responsibility of the user but this is not how application security and vulnerability works. This is a joint effort and all should take full "ownership" as threat actors are just waiting to exploit the weakness.

CISA REF of NVD score acceptance: https://www.cisa.gov/news-events/bulletins/sb24-309

Anyways, I would like to have your thoughts about it. Liad

‫בתאריך יום ד׳, 20 בנוב׳ 2024 ב-22:39 מאת ‪Jon‬‏ @.*** ‬‏>:‬

Hey there. It looks like the cvss 3.1 score we have comes from the huntr.dev report which I'm inclined to believe over the nvd score. Reading from the huntr thread it seems like langchain does document that access control is the responsibility of the api user.

Ref: https://huntr.com/bounties/8f4ad910-7fdc-4089-8f0a-b5df5f32e7c5

— Reply to this email directly, view it on GitHub https://github.com/github/advisory-database/pull/5025#issuecomment-2489500687, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOF3AD6CZD2KNKSE2U2R3YD2BTXP3AVCNFSM6AAAAABSFLFQQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBZGUYDANRYG4 . You are receiving this because you authored the thread.Message ID: @.***>

darakian commented 1 week ago

LangChain includes a dangerous functionality opt-in called "allow_dangerous_requests", but it is implemented in numerous chains but was not consistently across all text2query chains/tools. After my disclosure, they mitigated these vulnerabilities and added "allow_dangerous_requests" to the affected chains/tools.

That reads like it should be a considered a vuln in the chain that uses the function to me. It seems somewhat analogous to a database exposing full sql capacity. One would not say that mySQL or postgresql or whatever is vulnerable to sql injection, but rather some application using it could be depending on the context of the application.

From a technical perspective, the CVSS score assigned to this vulnerability is incorrect. This is not a local vulnerability but a network vulnerability

Is that the case? Does langchain accept prompts from the network by default?

liadlevy commented 1 week ago

I’m not sure I follow your analogy. Databases that expose SQL queries are explicitly designed to run SQL, whereas LangChain is not intended as a SQL engine to run queries like other packages, alembic or sqlalchemy. Instead, it takes user text input and generates SQL (or Cypher) queries under the hood as langchain is a "Build context-aware reasoning applications".

If you compare this to the alternative LlamaIndex, you’ll notice they avoid this vulnerability by separating the logic for generating a Cypher query from executing it.

Here’s the difference:

LlamaIndex: User input (text) → Cypher query generation → Explicit action to execute the query on the database (2-step process) → System response (text).
LangChain: User input (text) → Cypher query automatically executed on the database (single-step process) → System response (text).

This distinction is critical because LangChain’s approach combines query generation and execution into a single action, increasing the risk of vulnerabilities and there isn't a way to validate a query before it runs on the database. Is this a network vulnerability? LangChain doesn’t accept prompts directly from the network by default, but it is almost always integrated into APIs or applications that do (This is how RAG applications work and is useless if not). This makes it effectively a network vulnerability because the system becomes exposed through typical usage. as "Local" vulnerability means the attacker has already access to the machine / component usually seen in PE vulnerabilities or LFI vulnerabilities. For example, Log4j itself does not expose a network interface but became a network vulnerability due to its handling of untrusted input, such as logging user-provided data that could trigger malicious behavior like JNDI lookups.

Furthermore, you can even look at the integrity, availability and complexity as they are reported false as well, as How complex is it to write "Delete all my nodes in database" and after this action database can't serve nothing as all was deleted by the attacker (availability).

I hope my detailed answer gives more sense of the vulnerability and its impact. Liad

‫בתאריך יום ה׳, 21 בנוב׳ 2024 ב-0:34 מאת ‪Jon‬‏ @.*** ‬‏>:‬

LangChain includes a dangerous functionality opt-in called "allow_dangerous_requests", but it is implemented in numerous chains but was not consistently across all text2query chains/tools. After my disclosure, they mitigated these vulnerabilities and added "allow_dangerous_requests" to the affected chains/tools.

That reads like it should be a considered a vuln in the chain that uses the function to me. It seems somewhat analogous to a database exposing full sql capacity. One would not say that mySQL or postgresql or whatever is vulnerable to sql injection, but rather some application using it could be depending on the context of the application.

From a technical perspective, the CVSS score assigned to this vulnerability is incorrect. This is not a local vulnerability but a network vulnerability

Is that the case? Does langchain accept prompts from the network by default?

— Reply to this email directly, view it on GitHub https://github.com/github/advisory-database/pull/5025#issuecomment-2489671997, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOF3AD5YT2LIAR53D77AMXD2BUE63AVCNFSM6AAAAABSFLFQQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBZGY3TCOJZG4 . You are receiving this because you authored the thread.Message ID: @.***>

darakian commented 1 week ago

I’m not sure I follow your analogy.

The analogy is that an api from one project is created to be used in an open ended way by another project.

LangChain doesn’t accept prompts directly from the network by default, but it is almost always integrated into APIs or applications that do.

Would that not be a problem with the use of langchain as opposed to langchain itself?

liadlevy commented 1 week ago

I am not sure why you keep focusing only on a specific sentence without the context of the paragraph.

Langchain and in specific GraphCypherQAChain is designed to get user input, this is the way it works also in any blog you open from the official Docs of langchain or neo4j they explain exactly how to use this functionality and in most or all cases the input passed is user input. Does log4shell a network vulnerability or a local one?

2. Again, please address all other params in the CVSS score.

The problem lies within LangChain itself, as they fall short in implementing security measures at the API level, unlike other Text2Action chains they provide. LangChain is not intended to execute raw SQL queries over a database, and this is not its purpose. Instead, it takes user input as text and returns a response for "context-aware reasoning applications," as stated in the first line of the GitHub project. It is their responsibility to provide a secure way to use this functionality.

A comparable example is the Log4Shell vulnerability. The purpose of Log4j is to serve as a logging framework, not a JNDI and LDAP tool. However, Log4j used these functionalities to enhance its logging system, inadvertently creating a vulnerability, and had these features enabled by default. Similarly, LangChain's purpose is to return "smarter" answers to questions using "Cypher" as a tool to improve the quality of the responses. Unfortunately, this functionality is also enabled by default, posing significant risks.

Thanks,

Liad

‫בתאריך יום ה׳, 21 בנוב׳ 2024 ב-1:41 מאת ‪Jon‬‏ @.*** ‬‏>:‬

I’m not sure I follow your analogy.

The analogy is that an api from one project is created to be used in an open ended way by another project.

LangChain doesn’t accept prompts directly from the network by default, but it is almost always integrated into APIs or applications that do.

Would that not be a problem with the use of langchain as opposed to langchain itself?

— Reply to this email directly, view it on GitHub https://github.com/github/advisory-database/pull/5025#issuecomment-2489762250, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOF3AD3VOJZCDMGXIXW7ZKL2BUMZLAVCNFSM6AAAAABSFLFQQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBZG43DEMRVGA . You are receiving this because you authored the thread.Message ID: @.***>

darakian commented 1 week ago

I'm not seeing evidence that convinces me to change the cvss so I am not accepting this PR. We believe huntr did an appropriate job scoring this vulnerability and we would advise you to reach out to applications using langchain inappropriately in order to help secure them.

liadlevy commented 1 week ago

I really not sure about what evidence are you trying to find. It is what it is. Langchain has business objective to have as lowest CVSS score.

You claim you “trust” huntr for scoring correctly but in fact and it’s not debatable about availability, integrity and complexity.

Is this complex to exploit? No, so how complexity is high? Is this vulnerability can delete or modify the data in the database? Yes, so how its Low?

Furthermore you did not approach any of my questions and you don’t explain why you think differently.

I did not score the NVD and they scored it as 9.8 not me.

Anyways, I may release this thread to the public to hear their thoughts about it.

Liad

בתאריך יום ה׳, 21 בנוב׳ 2024 ב-20:16 מאת Jon @.***>:

I'm not seeing evidence that convinces me to change the cvss so I am not accepting this PR. We believe huntr did an appropriate job scoring this vulnerability and we would advise you to reach out to applications using langchain inappropriately in order to help secure them.

— Reply to this email directly, view it on GitHub https://github.com/github/advisory-database/pull/5025#issuecomment-2491949782, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOF3AD6XRXKIXMWDMM7B5YT2BYPQPAVCNFSM6AAAAABSFLFQQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJRHE2DSNZYGI . You are receiving this because you authored the thread.Message ID: @.***>

github / advisory-database

[GHSA-45pg-36p6-83v9] Langchain SQL Injection vulnerability #5025