dlt-hub / verified-sources

Contribute to dlt verified sources 🔥
https://dlthub.com/docs/walkthroughs/add-a-verified-source
Apache License 2.0
68 stars 49 forks source link

Error in HubSpot Pipeline Execution: HTTPError 414 Client Error: URI Too Long #305

Closed dat-a-man closed 10 months ago

dat-a-man commented 10 months ago

Source name Name of the affected source as displayed by dlt init or present in sources folder of this repo. hubspot Make sure that

Describe the bug Please describe the bug. If possible provide us: Previously functional and used for loading deals data, the pipeline is now consistently erroring despite updating to DLT in the system, using new virtual environments, and creating a new pipeline.

  1. hubspot()
  2. "deals"
  3. BigQuery
  4. any stack traces and logs you can share

To Reproduce Steps to reproduce the behavior:

  1. Initiate a Hubspot verified source.
  2. Update credentials.
  3. Run example pipeline load_crm_data() with source as hubspot().with_resources("deals")

Expected behavior The pipeline was expected to load deal data from Hubspot to BigQuery. It was working fine last month, but is now encountering errors.

Stack traces and other evidence

(venv) (base) radheshyaam@Heenas-MacBook-Air_hubspot_recheck % python3 hubspot_pipeline.py
/Users/radheshyaam/PycharmProjects/hubspot_v2/venv/lib/python3.10/site-packages/google/cloud/bigquery/client.py:562: UserWarning: Cannot create BigQuery Storage client, the dependency google-cloud-bigquery-storage is not installed.
  warnings.warn(
Traceback (most recent call last):
  File "/Users/radheshyaam/PycharmProjects/hubspot_v2/venv/lib/python3.10/site-packages/dlt/extract/pipe.py", line 696, in _get_source_item_current
    item = next(gen)
  File "/Users/radheshyaam/PycharmProjects/hubspot_v2/hubspot_recheck/hubspot/__init__.py", line 126, in deals
    yield from crm_objects("deal", api_key, include_history)
  File "/Users/radheshyaam/PycharmProjects/hubspot_v2/hubspot_recheck/hubspot/__init__.py", line 91, in crm_objects
    yield from fetch_data(CRM_OBJECT_ENDPOINTS[object_type], api_key, params=params)
  File "/Users/radheshyaam/PycharmProjects/hubspot_v2/hubspot_recheck/hubspot/helpers.py", line 122, in fetch_data
    r = requests.get(url, headers=headers, params=params)
  File "/Users/radheshyaam/PycharmProjects/hubspot_v2/venv/lib/python3.10/site-packages/dlt/sources/helpers/requests/retry.py", line 192, in <lambda>
    self.get = lambda *a, **kw: self.session.get(*a, **kw)
  File "/Users/radheshyaam/PycharmProjects/hubspot_v2/venv/lib/python3.10/site-packages/requests/sessions.py", line 602, in get
    return self.request("GET", url, **kwargs)
  File "/Users/radheshyaam/PycharmProjects/hubspot_v2/venv/lib/python3.10/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/Users/radheshyaam/PycharmProjects/hubspot_v2/venv/lib/python3.10/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/Users/radheshyaam/PycharmProjects/hubspot_v2/venv/lib/python3.10/site-packages/tenacity/__init__.py", line 314, in iter
    return fut.result()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/Users/radheshyaam/PycharmProjects/hubspot_v2/venv/lib/python3.10/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/Users/radheshyaam/PycharmProjects/hubspot_v2/venv/lib/python3.10/site-packages/dlt/sources/helpers/requests/session.py", line 48, in request
    resp.raise_for_status()
  File "/Users/radheshyaam/PycharmProjects/hubspot_v2/venv/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 414 Client Error: URI Too Long for url: [https://api.hubapi.com/crm/v3/objects/deals?properties=age_contact_person%2Camount_in_home_currency%2Cbudget%2Cbudget_available%2Cbusiness_budget%2Cch](https://api.hubapi.com/crm/v3/objects/deals?properties=age_contact_person%2Camount_in_home_currency%2Cbudget%2Cbudget_available%2Cbusiness_budget%2Cchampion%2Cchamppion%2Ccms%2Ccommitted_seats%2Ccompany_expected_user%2Ccompany_footprint%2Ccompany_growth_potential%2Ccompany_size%2Ccompany_type%2Ccompelling_event%2Ccompeting_options%2Ccompetition%2Ccompettition%2Ccoms%2Ccsm_owner%2Ccustomer_segment%2Cdays_to_close%2Cdeal_currency_code%2Cdeal_source%2Cdeal_source_web%2Cdeal_status%2Cdecision_criteria%2Cdecision_criteria_stakeholder%2Cdecision_kriteria%2Cdecision_maker_involved%2Cdecision_process%2Cdecision_prozess%2Cdecision_timeline%2Cdemos_scheduled%2Cdigital_afinity%2Cdiscovery_call_completed%2Cdiscovery_call_scheduled%2Cdt%2Ceconomic_buyer%2Ceconomic_decision_maker%2Cemployee_buisness_budget%2Cemployee_location%2Cemployees%2Cemployees_filled_in_web_form%2Centered_deal_stage_at%2Cestimated_go_live%2Cestimated_monthly_recurring_revenue%2Cexec_reach_out_done_%2Cgmv_goal%2Cgmv_potential%2Chr_it%2Chr_tool%2Chs_acv%2Chs_all_assigned_business_unit_ids%2Chs_all_collaborator_owner_ids%2Chs_all_deal_split_owner_ids%2Chs_analytics_latest_source%2Chs_analytics_latest_source_company%2Chs_analytics_latest_source_contact%2Chs_analytics_latest_source_data_1%2Chs_analytics_latest_source_data_1_company%2Chs_analytics_latest_source_data_1_contact%2Chs_analytics_latest_source_data_2%2Chs_analytics_latest_source_data_2_company%2Chs_analytics_latest_source_data_2_contact%2Chs_analytics_latest_source_timestamp%2Chs_analytics_latest_source_timestamp_company%2Chs_analytics_latest_source_timestamp_contact%2Chs_analytics_source%2Chs_analytics_source_data_1%2Chs_analytics_source_data_2%2Chs_arr%2Chs_campaign%2Chs_closed_amount%2Chs_closed_amount_in_home_currency%2)

Please paste any stack traces, logs and screenshots. Mind that the issue is public!

Running environment

Additional context

There were some new custom columns added to Hubspot in the past month by the user. So total columns in the deals is 299, so that must be causing this error.

adrianbr commented 10 months ago

looks like requesting with all field names exceeds the url size limit. it should be sent as a separate payload or 2 separate requests could be done issue happens here https://github.com/dlt-hub/verified-sources/blob/master/sources/hubspot/__init__.py#L88C5-L89