OWASP / Top10

Official OWASP Top 10 Document Repository
Other
4.24k stars 824 forks source link

RC2 - A4 XML External Entities (XXE) #135

Closed ossie-git closed 6 years ago

ossie-git commented 6 years ago

"are in use. Co" -> Co?

I would add that XXE attacks can be due to documents that are uploaded and later processed by the application programmatically (docx, pptx, etc.) and that this is a potential attack vector

This section, as mentioned, looks like it needs some additional content. Might update some suggestions if I find the time.

Perhaps mentioned SSRF here might be useful

PS. Not sure how this this issue made it to #4 in the top 10 considering everyone's moving to JSON.

jmanico commented 6 years ago

Many apps and products have older XML service endpoints that are not configured properly. XXE was one of the top successful attack methods in real world attacks in 2016 per sync.io, which is why this was added. Good call, IMO.

This is also hard to defend against. WAF is only partial defense and devs need to meticulously configure XML parsers...

Aloha, Jim

On Oct 16, 2017, at 8:13 AM, ossie-git notifications@github.com wrote:

"are in use. Co" -> Co?

I would add that XXE attacks can be due to documents that are uploaded and later processed by the application programmatically (docx, pptx, etc.) and that this is a potential attack vector

This section, as mentioned, looks like it needs some additional content. Might update some suggestions if I find the time.

Perhaps mentioned SSRF here might be useful

PS. Not sure how this this issue made it to #4 in the top 10 considering everyone's moving to JSON.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

drwetter commented 6 years ago

I am with @ossie-git here. #4 feels for me way too much.

Is there data available how much is JSON / XML / other?

On 10/16/2017 02:26 PM, Jim Manico wrote:

Many apps and products have older XML service endpoints that are not configured properly. XXE was one of the top successful attack methods in real world attacks in 2016 per sync.io, which is why this was added. Good call, IMO.

This is also hard to defend against. WAF is only partial defense and devs need to meticulously configure XML parsers...

Aloha, Jim

On Oct 16, 2017, at 8:13 AM, ossie-git notifications@github.com wrote:

"are in use. Co" -> Co?

I would add that XXE attacks can be due to documents that are uploaded and later processed by the application programmatically (docx, pptx, etc.) and that this is a potential attack vector

This section, as mentioned, looks like it needs some additional content. Might update some suggestions if I find the time.

Perhaps mentioned SSRF here might be useful

PS. Not sure how this this issue made it to #4 in the top 10 considering everyone's moving to JSON.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OWASP/Top10/issues/135#issuecomment-336871006, or mute the thread https://github.com/notifications/unsubscribe-auth/AHqhd2xEhTEBIQxXK3euPoEtaYu8J-Swks5ss0tfgaJpZM4P6hqC.

Neil-Smithline commented 6 years ago

We don't have data on what data formats are used @drwetter, but we do have data on the prevalence of XXE attacks. Here's a summary, and we'll publish more details once we ship RC2. XXE was one of the most commonly found problems.

drwetter commented 6 years ago

Hi Neil,

thank you.

Forgive my persistence but the table is containing a summary but not the details behind it.

So I am looking forward to the publication of details which is explaining it.

Cheers, Dirk

⁣-- Sent via mobile. Excuse my brevity, my typos and the autocorrection​

Am 16. Okt. 2017, 16:15, um 16:15, Neil Smithline notifications@github.com schrieb:

We don't have data on what data formats are used @drwetter, but we do have data on the prevalence of XXE attacks. Here's a summary, and we'll publish more details once we ship RC2. XXE was one of the most commonly found problems.

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/OWASP/Top10/issues/135#issuecomment-336900164

vanderaj commented 6 years ago

Please don't be surprised if this issue gets pushed post RC2.

ossie-git commented 6 years ago

@jmanico I agree it's an issue and it's out there, but I see the OWASP Top 10 more as a guide for companies starting to build their app sec programs and who will apply it to future applications being developed (or major components of existing applications being developed), not as a standard that companies will use to re-assess / re-implement parts of older applications as doing is very expensive. New applications being developed will typically expose REST APIs, even if they internally talk in the backend with older XML service endpoints.

drwetter commented 6 years ago

Is there data available how much is JSON / XML / other?

We don't have data on what data formats are used @drwetter, but we do have data on the prevalence of XXE attacks. Here's a summary, and we'll publish more details once we ship RC2. XXE was one of the most commonly found problems.

Neil, forgive me but if we don't have data on usage of JSON / XML / HTML / etc out there what is the data we collected worth?

My suspicion is that the data collection is not a representative sample of the protocols out there but for some reason backend web services are way over-represented.

I came back from a developer conference and everybody I talked to felt #4 is rather exaggerated.

drwetter commented 6 years ago

Please note that if we don't know the distribution of JSON / XML / HTML / etc the prevalence factor is probably not correct.

drwetter commented 6 years ago

To add a bit: My education is a science background (did my PhD in solid state physics),

From the science perspective we do it wrong here. If one doesn't know for sure what the basis of the data is one collected -- I am referring to " We don't have data on what data formats are used" -- one should say "Ok, toss that. The experiment needs to be redone". Period.

I sense -- attention: sarcasm -- we don't wanna do that. But what we can do 1 is trying to get data about the prevalence of XML / JSON / HTML out there in the interweb and weight our results accordingly. I did a quick research aka googling ;-) but I wasn't able to find anything. Any help would be appreciated.

Fallback otherwise would be an educated guess of the prevalence.

1 It is to me mandatory we find out next time 2020

PeterMosmans commented 6 years ago

Hi @drwetter I totally agree with you on this one -

"My suspicion is that the data collection is not a representative sample of the protocols out there but for some reason backend web services are way over-represented.".

From my humble personal perspective as somebody writing, reviewing and reading lots of pentest reports, and (also) speaking with devs and fellow pentesters about this, XXE feels really overrepresented. Vulnerabilities like open redirects or CSRF are much more prevalent. I think that developers, who use the top 10 for guidance, would be better off learning more about (securing) those vulnerabilities instead. As @ossie-git stated:

I see the OWASP Top 10 more as a guide for companies starting to build their app sec programs and who will apply it to future applications being developed

Cheers,

Peter

gilzow commented 6 years ago

As a developer who spends 75% of my time developing, and 25% of my time as a security analyst, XXE most definitely feels over represented. I can only remember one instance of our (security) team discovering an XXE vulnerability over the last year, while CSRF is found consistently. Open redirects aren't as prevalent as they used to be, but are more so than XXE.

vanderaj commented 6 years ago

This has been resolved with the latest commit.

vanderaj commented 6 years ago

In terms of data, we have sufficient evidence from SAST vendors that XXE is somewhat likely, but it's really the impact of XXE that drives the #4 spot.

drwetter commented 6 years ago

Hi Andrew,

can you comment on my arguments please. Again it seems absurd to me -- and not to me only -- to believe in the trust of data when the overall prevalence of XML and the condition of the data collection isn't clear.

Thx, Dirk

PeterMosmans commented 6 years ago

@vanderaj I really appreciate all the efforts, thanks for that! I know I'm rather late in the process - however, as XXE ending up on spot 4 is such a big deviation from the previous release candidate, it probably warrants a close(r) look.

Question: is there a way that 'we' can support you with finding the claim that XXE deserves the number 4 spot ? Is there anything 'we' can do to look deeper into this issue ? @drwetter you mentioned researching the distribution of of JSON / XML / ... for instance.

Data from SAST vendors (...) is one thing, but I think that @danielmiesslier raises some fair points about the quality of data in his article on https://danielmiessler.com/blog/owasp-top-10-lists-are-art-not-science/

Again, thanks for your work and consideration. The OWASP top 10 is an extremely visible and important project, that's probably why so many people are "so passionate" about it ;)

Cheers,

Peter

drwetter commented 6 years ago

In terms of data, we have sufficient evidence from SAST vendors that XXE is somewhat likely, but it's really the impact of XXE that drives the #4 spot.

In a real world scenario I also doubt that sites who can afford SAST scanners have much in common with the real world out there.

There's a spot-on -- unfortunately German -- proverb "Traue keiner Statistik, die du nicht selbst gefälscht hast." --> Don't trust statistics which you didn't make up yourself. It means in a cynical way, you shouldn't trust data unless you have collected them yourself.

As Daniel Miessler wrote it's not science, not even close.

jmanico commented 6 years ago

Are there other data points or scientific processes we should be looking at? How can we increase “science” per your perspective, Dr. Wetter?

While I agree we can do more; I feel the improvement from past years is significant.

On Nov 6, 2017, at 9:56 AM, Dirk Wetter notifications@github.com wrote:

In terms of data, we have sufficient evidence from SAST vendors that XXE is somewhat likely, but it's really the impact of XXE that drives the #4 spot.

In a real world scenario I also doubt that sites who can afford SAST scanners have much in common with the real world out there.

There's a spot-on -- unfortunately German -- proverb "Traue keiner Statistik, die du nicht selbst gefälscht hast." --> Don't trust statistics which you didn't make up yourself. It means in a cynical way, you shouldn't trust data unless you have collected them yourself.

As Daniel Miessler wrote it's not science, not even close.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

drwetter commented 6 years ago

Jim! :-)

Are there other data points or scientific processes we should be looking at? How can we increase “science” per your perspective, Dr. Wetter?

Good point. My suggestion was to do an educated guess by taking the prevalence of content type in the internet into account (JSON / HTML / XML) and weight XXE accordingly.

But it really wouldn't overdo it with science here, it's IMO not the right approach. As Peter mentioned "The OWASP top 10 is an extremely visible and important project" and I would keep that as the goal in mind.

That would be my suggestion for 2017.

For 2020+: Depends what the goal is, visibility document or science. So either continue with the data collection and apply educated guesses and a common sense of what is out there. Or sit down before collecting any data and discuss how much this applies to real world.

While I agree we can do more; I feel the improvement from past years is significant.

Despite any technical differences I mentioned -- this is just my nature :-) -- I agree that this project runs way better, also as far as the O is concerned. Thanks to all leads!

ossie-git commented 6 years ago

I think one data point that could be added to the 2020 timeline is prevalence vs. modernity / when the application was developed. For example, newly developed applications in 2019 are unlikely to have issues common in older applications (XXE, XSS as UI frameworks like Angular and wider adoption of CSP headers will substantively lower the prevalence of XSS, deserialization will be less common in new applications, etc.) while these issues may still be prevalent in older applications.

It would still be the Top 10 but with 2 lists in the document (mostly the same but with small changes and perhaps 2 - 3 items that are different): one for someone wanting to start building new applications securely and one for securing older applications.

sslHello commented 6 years ago

Hi, I'd like to add my 2 cents to the discussion above the latter post. Sorry, I is it possible that you've overlooked that the risk factor 'technical impact (severe=3)' besides the 'detectability (easy=3)' are the main reason for the high position in the Top 10. We have especially extended the data call to get more data from more enterprises. The quoted blog at danielmiessler.com is about RC1, from June 13, 2017. I am sorry but I can't see any relation to this discussion.
As far as I can see XXE did not get any 'special treatment', Brian analyzed the data as they were. It got a common (=2) prevalence, even if only a subset of all tested applications use XML. Furthermore there is some data with 0%, which could also include false negatives, as XXE had not been tested. Yes, we have also discussed that we'd liked to structure the data call differently, to get even more information from the donated data. But after RC1 it was too late for this and I am sure we have reached already a very reasonable result :wink: So I do hope that it's acceptable for you to have XXE on pos #4 in this version. I hope this version of the Top 10 will raise the awareness for this vulnerability and we will be able to say good bye to XXE in the next version as we've done to CSRF this time. Cheers Torsten