achillean / shodan-python

The official Python library for Shodan
https://developer.shodan.io
Other
2.47k stars 552 forks source link

Truncated search results using the API (paid account) #190

Closed nuubnuub closed 1 year ago

nuubnuub commented 1 year ago

Creating an issue out of a comment that I made on #145, as I'm not using the Free API (corporate account), and I'm consistently getting the same truncated results.

I'm aware that the search(), and search_cursor() methods silently fail when the API call fails, but it's to a point where I'll query the results for a single host and the count of returned results is 14, but I can only get 5 of them. If it was something like 10,000 total results but only being able to capture 8500 that's a little more understandable but, lately I can even parse 50% of the total results.

API_KEY = os.environ['Shodan_API_Key']
api = Shodan(API_KEY)

query = 'hostname:my.query'
limit = api.count(query=query)['total']

logging.info(f'Total Results to Parse: {limit}')
print(f'Total Results to Parse: {limit}')

counter = 0
info_wanted = []

for banner in api.search_cursor(query, retries=100):  # The retries are arbitrary whether its 5 to 100 it still fails.
    counter += 1
    try:
        info_d = {'ip_str': banner['ip_str'],
                  'country_name': banner['location']['country_name'],
                  'longitude': banner['location']['longitude'],
                  'latitude': banner['location']['latitude'],
                  'hostnames': banner['hostnames']}
        info_wanted.append(info_d)

        logging.info(f'Information Parsed. Parsed: {counter}, Remaining: {limit - counter}')
        print(f'Information Parsed. Parsed: {counter}, Remaining: {limit - counter}')
        # Keep track of how many results have been downloaded so we don't use up all our query credits
        if counter >= limit:
            break
    except KeyError as e:

        logging.info(f'Neutral Result Parsed. Parsed: {counter}, Remaining: {limit - counter}')
        print(f'Neutral Result Parsed. Parsed: {counter}, Remaining: {limit - counter}')
        info_d = {'ip_str': banner['ip_str'],
                  'country_name': banner['location']['country_name'],
                  'longitude': banner['location']['longitude'],
                  'latitude': banner['location']['latitude'],
                  'hostnames': banner['hostnames']}
        info_wanted.append(info_d)
        if counter >= limit:
            break
        continue

The output of the results, showing how poor the search results are:

Screenshot 2023-06-21 at 11 04 36 AM

And for even larger queries:

Screenshot 2023-06-21 at 11 38 20 AM

I don't know if this will get addressed, but any solution would be nice! As the CLI is limited to 1000 returned results, and right now I'm resorting to having to pull directly from the UI for any results that can't be pulled using the API.

nuubnuub commented 1 year ago

@achillean is anyone there?

achillean commented 1 year ago

The CLI isn't limited to 1,000 results and what you're describing is due to a recent database migration. The issue should already be significantly better now. Additionally, this isn't a technical issue with the library/ CLI but rather with work we had to perform on the API. Please use support@shodan.io for support-related questions.