Cgboal / SonarSearch

A rapid API for the Project Sonar dataset
MIT License
639 stars 97 forks source link

Raw Output Endpoint #23

Open indianajson opened 3 years ago

indianajson commented 3 years ago

Thank you for your work on this project. Once I found your API I was hooked. This really is a fantastic tool.

There is one limitation in SonarSearch that keeps me reliant upon my raw copy of the SONAR dataset, the inability to query the raw data thus getting both halves of the record. For example, let's say I want to return a list of every S3 bucket and see how many are CNAMEs for (sub)domains. I can simply grep my raw data set and in a little while have the full JSON returned

{"timestamp":"1622162183","name":"10gen-gvf.s3.amazonaws.com","type":"cname","value":"s3-1-w.amazonaws.com"}
{"timestamp":"1622162106","name":"10in30.s3.amazonaws.com","type":"cname","value":"s3-us-west-2-w.amazonaws.com"}
{"timestamp":"1622162164","name":"10judzoqo6kzfffm183231.covideos.s3.amazonaws.com","type":"cname","value":"s3-1-w.amazonaws.com"}

Admittedly, there will be a lot of duplicates where the S3 bucket points to AWS, but I can filter that myself fairly easily. Based on what I read in the source code you are saving both the record name and value, thus could we get an endpoint that returned something with both halves of the record?

Perhaps something like /raw/{domain}? Where if we queried /raw/zendesk.com we would get:

[
"1125.zendesk.com":"1125.diversified-capital.com",
"15below.ssl.zendesk.com":"15below.zendesk.com",
"18emaint.zendesk.com":"18emaint.motorwerks.com",
"1d-color.zendesk.com":"1d-color.support.ec-force.com",
"1stadm.zendesk.com":"1stadm.ssl.zendesk.com"
]

This could also be helpful for target-specific searches. For example, if we queried /raw/hackerone.com we would be returned more helpful information for research:

[
"mta-sts.dev.wearehackerone.com":"hacker0x01.github.io",
"mta-sts.forwarding.hackerone.com",:"hacker0x01.github.io",
"mta-sts.hackerone.com":"hacker0x01.github.io"
]
Cgboal commented 2 years ago

I'll look into this, apologies for the delay.