Closed Rhynorater closed 6 years ago
The second idea is a good one. Let the tool or the parser decide what is a host for it.
The second idea looks much better. I faced a similar issue when building a tool and decided to separate ips from hosts.
It is mostly about storing data, but may be this would be useful. I have defined a host equal to hostname. So the idea is the following:
Yes, the second idea sounds good to me too. Some tools will provide IP information on hosts while others will provide domain names. All the different tools can load each others' reconjson files and further populate it with more data instead of creating their own.
Agree with @iad42's comments above. In a system I have been working on, I have apex
(eg. example.com), host
(eg. foo.example.com) which links to an apex
, ip
(eg. 1.2.3.4), etc as their own entities. A host
will always have a link to an apex
. An ip
can have links to host
records. ports
are linked to an ip
. Etc. It maps itself nicely into a graph layout, which could then be translated back to a 'flatter' json layout.
Expected Behavior
ReconJSON is expected to provide a data standard that accommodates all different types of recon. Recon is designed around scope. Scope can be defined in many different ways: single ips, ip ranges, wildcard domains, and specific subdomains. As a result, we need a format that will accommodate those standards and their individual definitions of a host.
Current Behavior
The current behaviour doesn't define what identifies a "unique host." As a result, we can run into issues based off of the different types of scope mentioned above. For example, a wildcard domain scope might say that "example.acme.com" is in scope. However, "example.acme.com" resolves to 54.0.0.1 AND 54.0.0.2. As a result, we have 2 "physical" systems that resolve to one host in ReconJSON format. This could result in conflicts if 54.0.0.1 has port 22 open and 54.0.0.2 has port 22 filtered.
However, on the flip side, if we define a host as an IP address, then we can run into issues where we get duplicates. Consider the above scenario with example.acme.com resolving to 54.0.0.1 and 54.0.0.2. If we define the IP address as the unique identifier, our dataset will look like this:
Possible Solution
There are several possible solutions that I can conceive:
We define the unique identifier for a host on the first line of the file and leave it up to the parser to resolve. Our file would then look like this:
OR
This approach solves the issue, but makes the file more difficult to parse and merge into ones own tool. This also hurts the uniformity of the standard.
This can be left up to the user to decide based off of the tool they are using. Consider the following applications:
What should we expect it to do? I would expect that it would take the "ip" field and scan that and return the results. However, what if it is passed this from the results of a Subdomain enumeration with no dns resolve:
Well, I would expect it to resolve the subdomain and return something like this:
Or perhaps even without the subdomain (as nmap does):
In these scenarios we know that the port scanner is focusing on the IP address and the parser will need to tell as much.
and if a resolve feature is turned on, we might see something like this:
We can see in this scenario that the subdomain enumeration tool considers these 2 different host because the tool is focused on Subdomain enumeration.
In this case, the user (or parser) would be responsible for merging this data together into the format that is most reasonable for their usecase.