DomainTools / python_api

DomainTools Official Python API
MIT License
82 stars 32 forks source link

414 Request-URI Too Large exception when requesting a list of domains with long names #59

Closed emerrf closed 3 years ago

emerrf commented 4 years ago

Description: When using the iris_enrich function with a batch size of domain less than the limit of 100, the package returns the 414 Request-URI Too Large exception. I guess it is related to the sum of the string length of the requested domains.

Can the API handle this situation? Or is it supposed to be managed by the end-user? If so, what is the limit? The example below estimates the maximum total length of 1728 characters.

Tested on:

Reproducible example:

import os
from time import sleep
from domaintools import API, __version__

print(f"DomainTools API verison: {__version__}")

API_USERNAME = os.getenv("API_USERNAME", "your_hardcoded_username_here")
API_KEY = os.getenv("API_KEY", "your_hardcoded_password_here")

dtools_api = API(API_USERNAME, API_KEY)

domains = [
    'ics-informationsystems.com', 'githubusercontent.com', 'familiaganadora.com.ar',
    'infrastructuremalta.com', 'instantlocaldates.com', 'saglikliadimlarprojesi.org',
    'rollersadnessstranded.com', 'finedininglovers.co.uk', 'sempliceassicurazione.it',
    'magazintraditional.ro', 'freeiworktemplates.com', 'starthealthystayhealthy.lk',
    'costcowaterdelivery.com', 'flowergardengirl.co.uk', 'theparentingindex.com',
    'zyjsmacznieizdrowo.pl', 'polandspringbornbetter.com', 'formation-et-expertise.fr',
    'freshlymadesimplyfrozen.com', 'jeanmarcmorandini.com', 'ws-cookie-manager.com',
    'universidadeeuropeia.pt', 'runative-syndicate.com', 'galderma-kundenservice.de',
    'entremamasnido.com.mx', 'sanpellegrinofestivals.com', 'atrium-innovations.com',
    'landlordmanoeuvre.com', 'animeunityserver30.cloud', 'microsoftonline-p.com',
    'nurturewellnessvillage.com', 'naturesheartsuperfoods.co.uk', 'cuidarseesdisfrutar.com.mx',
    'visualwebsiteoptimizer.com', 'novocraquedacozinha.com.br', 'restaurantbenjamins.ro',
    'anticagelateriadelcorso.com', 'beveragelcafootprint.com', 'saglikliadimlarprojesi.com',
    'zyjzdrowoisportowo.pl', 'birliktekoruyalim.com', 'microsoftstoreemail.com',
    'hotelmarshalgarden.ro', 'zdrowystartwprzyszlosc.pl', 'microsofttranslator.com',
    'pureliferippleeffect.com', 'pickyeatersarabia.com', 'globalallianceforyouth.org',
    'hidratatemejor.com.mx', 'promilnurturethegift.com.ph', 'youthparliament.com.pk',
    'starthealthystayhealthy.com.bd', 'wholeearthfarms.com.ar', 'secretsdegourmets.com',
    'microsoftonline-p.net', 'greengarden-events.ro', 'prosecutorcessationdial.com',
    'action-gegen-hellen-hautkrebs.de', 'nestealovethebeach.com.ph', 'my-acticol-nutritionist.com',
    'cloudflareinsights.com', 'rxdirectplussavings.com', 'reducecatallergens.com',
    'alpotellitlikeitis.com', 'galdermaorderform.com', 'assicurazionircaonline.com',
    '20questionsaboutwater.com', 'mowembarknegligence.com', 'projektpodklucz.com.pl',
    'datadoghq-browser-agent.com', 'nutricionyejercicio.es', 'mistressavouchdeity.com',
    'multipleintelligence.com.ph', 'electricfoldinggate.com', 'foodandeverythingelse.ng',
    'cetaphilfriends.com.sg', 'sculptraaesthetic.com', 'nqticketdorado.com.mx',
    'iyibuyusuniyiyasasin.com', 'lawiswiskawayanresort.com', 'cerealpartnersfoodservice.co.uk',
    'goldendrop-baby.co.kr', 'experienciadolce.com.uy', 'proplanveterinarydiets.ca',
    'llenalacalledevida.es', 'warsztaty-vitaflo-mpku-katowice.pl', 'astonishinglysimplecoffee.com',
    'healthybreakfast.com.cn', 'petsatworkalliance.com', 'smaspecialfeeds.co.uk',
    'chameleoncoldbrew.com', 'ristorantepizzeriamaghera.it', 'weekenddiscoveries.com.ph',
    'nurturenetwork.com.ph', 'wykanczaniewnetrz.com', 'agircontrelesrougeurs.lu',
    'impactinformation.com', 'centre-equestre-divonne.com', 'rcbconlinebanking.com',
    'specialfeedsforspecialneeds.co.uk'
]

# Example 1: 414 Request-URI Too Large Error with batch of 100 domains
# domains_response = dtools_api.iris_enrich(*domains).response()

# Example 2: Estimate the maximum number of characters allowed
for size in range(70, 75):
    print(f"Num domains: {size}, Concat size: {sum([len(d) for d in domains[:size]])}")
    dtools_api.iris_enrich(*domains[:size]).response()
    print(f"Response OK")
    sleep(1.5)

Result:

DomainTools API verison: 0.5.2
Num domains: 70, Concat size: 1656
Response OK
Num domains: 71, Concat size: 1678
Response OK
Num domains: 72, Concat size: 1701
Response OK
Num domains: 73, Concat size: 1728
Response OK
Num domains: 74, Concat size: 1751
Traceback (most recent call last):
  File "C:/Users/erodriguez/PycharmProjects/gsoc-ds-ml-anomaly-detection/domaintools_414issue_example.py", line 55, in <module>
    dtools_api.iris_enrich(*domains[:size]).response()
  File "C:\Users\erodriguez\AppData\Local\Continuum\anaconda3\envs\gsoc-ds-ml-anomaly-detection-tests\lib\site-packages\domaintools\base_results.py", line 161, in response
    response = self.data()
  File "C:\Users\erodriguez\AppData\Local\Continuum\anaconda3\envs\gsoc-ds-ml-anomaly-detection-tests\lib\site-packages\domaintools\base_results.py", line 93, in data
    self.setStatus(results.status_code, results)
  File "C:\Users\erodriguez\AppData\Local\Continuum\anaconda3\envs\gsoc-ds-ml-anomaly-detection-tests\lib\site-packages\domaintools\base_results.py", line 155, in setStatus
    raise RequestUriTooLongException(code, reason)
domaintools.exceptions.RequestUriTooLongException: <html>
<head><title>414 Request-URI Too Large</title></head>
<body bgcolor="white">
<center><h1>414 Request-URI Too Large</h1></center>
<hr><center>nginx</center>
</body>
</html>
underrun commented 4 years ago

On the backend the api is HTTP. URI length is limited by web servers and clients (typically around 2k characters) - if lots of data needs to be communicated to the server a request body can be sent. though this works with both GET and POST HTTP verbs, typically it's only done with POST.

looks like enrich doesn't use POST:

https://github.com/DomainTools/python_api/blob/master/domaintools/base_results.py#L62