frictionlessdata / datapackage

Data Package is a standard consisting of a set of simple yet extensible specifications to describe datasets, data files and tabular data. It is a data definition language (DDL) and data API that facilitates findability, accessibility, interoperability, and reusability (FAIR) of data.
https://datapackage.org
The Unlicense
495 stars 113 forks source link

Probable security issue through use of URLs in "url-or-path" #650

Closed iSnow closed 9 months ago

iSnow commented 4 years ago

Similar to the attack vector described for paths here, I believe the use of URLs in Data Resources can lead to exfiltration of data about a network.

Setup

Consider the case of an API service that allows users to upload DataPackages and browse the contained resources' contents. For that, the service must download the contents from the URLs specified by the packages' Resources.

Attack scenario

The attacker would upload a DataPackage containing a resource with URLs pointing to guessed IP numbers out of the non-routed ranges, eg.

{
  "path": "http://192.168.0.1/index.html",
  "mediatype": "text/html"
}

and ask the service for the content of the Resource. The service would query 192.168.0.1 to fetch the index HTML, and if that server is running a HTTP-daemon, the API service would transfer the HTML back to the attacker. If the IP number is not used by a server, an error would be returned.

By slowly iterating the whole 10.x.x.x, 192.168.x.x and other ranges that are typically used for LAN's, the attacker could map out the LAN, and glean additional information about the servers by analyzing the exfiltrated HTML.

I am not sure about the real-world impact of this - on the one hand, a service typically would be running on either a hosted server not on a company's LAN or even on a container on a hosted server, which would blunt the mapping attack somewhat. On the other hand, it would be a probable attack for rogue employees/visitors of a data-science company that hosts data-pipelines for their data scientists and has a segmented network as part of a defense in depth. That segmentation would be weakened by such an attack as each employee with HTTP access to such a service could map out the segment the service resides in.

Attack mitigation

I don't think complete mitigation is possible. I believe the use of uncontrolled URLs is a fundamental weakness that allows all kinds of attacks (eg. a datapackage with thousands of URLs linking to very big Resource payloads would create an effective denial of service attack against either the API service or even the site hosting the payload files).

Ideally, a same-origin restriction should apply to Resource URLs to ensure site evilsite.com cannot DOS site filehoster.com via uploading packages to apiservice.com. This still would not prevent user-uploaded DataPackages to exfiltrate data.

Some incomplete protection could be achieved by blocking all Resource resolution for URLs pointing to IP numbers from the non-routable blocks.

Action points

I welcome feedback on this, maybe I am overlooking some points. Also, a strong warning for implementors should be in the DataResource specs and an equally strong warning in the user docs.

Personally, I believe it should be part of the spec that an implementing library has a switch that enables users of that library to disallow Resource-addressing via URLs. Only self-contained packages would be parsed.

roll commented 4 years ago

Greate catch @iSnow !

lwinfree commented 4 years ago

Thanks @iSnow! Hey @rufuspollock + @pwalsh Could y'all please take a look at this & comment on next steps? Thx!

rufuspollock commented 4 years ago

@iSnow this kind of attack has similarities to that discussed and addressed in discussion of unix relative paths ...

POSIX paths (unix-style with / as separator) are supported for referencing local files, with the security restraint that they MUST be relative siblings or children of the descriptor. Absolute paths (/) and relative parent paths (../) MUST NOT be used, and implementations SHOULD NOT support these path types.

Obviously, for fully qualified urls there is no simple way to exclude vulnerable paths. Basic starting point would be to disallow numeric IP address, localhost etc. And/or as you suggest to limit to "self-contained" URLs only.

Obviously general points about security apply: run this code on a system with appropriate permissions and we could flag that.

I think some comments on this in the specs would definitely be valuable. If @iSnow you have suggestions feel free to make them or even open a PR.

Thanks again for flagging this.

iSnow commented 4 years ago

Thanks, @rufuspollock - I opened a PR as a first draft of a security abstract. It is not meant to be a canonical source, but I'd invite discussion of the best practices I outlined:

I hope I got the threat-matrix right but others should think this through, security topics are notoriously hard to nail down on the first try.

roll commented 9 months ago

WAS MERGED in https://github.com/frictionlessdata/specs/pull/651