MeltanoLabs / tap-salesforce

Singer.io tap for the Salesforce API
GNU Affero General Public License v3.0
1 stars 30 forks source link

Start Date is not taken into account #3

Closed MeltyBot closed 2 years ago

MeltyBot commented 5 years ago

Migrated from GitLab: https://gitlab.com/meltano/tap-salesforce/-/issues/3

Originally created by @iroussos on 2018-12-12 14:04:02


Tap-salesforce documentation states that:

The start_date is used by the tap as a bound on SOQL queries when searching for records.

I have tested the tap-salesforce without using a state file and it seems like it is not using the start_date provided in config.json as a bound on the SOQL queries when searching for records. Not sure if the same problem is there when a state file is provided.

Setup:

  1. Branch used: add-password-auth
  2. Config file using PasswordCredentials with config["start_date"]="2018-12-01T00:00:00Z"
  3. Two entities selected in properties.json: {Case, AccountTeamMember}
tap-salesforce --config config.json --properties properties.json

The tap log clearly shows ~15000 records extracted for Case and 2255 records extracted for AccountTeamMember:

INFO METRIC: {"type": "counter", "metric": "record_count", "value": 1, "tags": {"endpoint": "Case"}}
INFO METRIC: {"type": "counter", "metric": "record_count", "value": 6342, "tags": {"endpoint": "Case"}}
INFO METRIC: {"type": "counter", "metric": "record_count", "value": 6995, "tags": {"endpoint": "Case"}}
INFO METRIC: {"type": "counter", "metric": "record_count", "value": 1767, "tags": {"endpoint": "Case"}}

...

INFO METRIC: {"type": "counter", "metric": "record_count", "value": 2255, "tags": {"endpoint": "AccountTeamMember"}}

When I checked the tables in "RAW"."SALESFORCE" for everything already extracted, I can see :

SELECT count(*)
FROM "RAW"."SALESFORCE"."CASE"
WHERE LAST_MODIFIED_DATE > '2018-12-01';

>>> 356

SELECT count(*)
FROM "RAW"."SALESFORCE"."CASE"
WHERE CREATED_DATE > '2018-12-01';

>>> 237

It's clear that the tap extracted everything:

SELECT count(*)
FROM "RAW"."SALESFORCE"."CASE";

>>> 15092

SELECT count(*)
FROM "RAW"."SALESFORCE"."ACCOUNT_TEAM_MEMBER";

>>> 2287

I think that the following POST requests recorded in the tap-salesforce log are an indication that the start_date is not used. The queries reported in the body of the post requests have no where clause:

INFO Making POST request to https://na34.salesforce.com/services/async/41.0/job/7506100000FHv5DAAT/batch with body SELECT Id,IsDeleted,CaseNumber,ContactId,AccountId,SourceId,SuppliedName,SuppliedEmail,SuppliedPhone,SuppliedCompany,Type,Status,Reason,Origin,Subject,Priority,Description,IsClosed,ClosedDate,OwnerId,CreatedDate,CreatedById,LastModifiedDate,LastModifiedById,SystemModstamp,ContactPhone,ContactMobile,ContactEmail,ContactFax,LastViewedDate,LastReferencedDate,Notes__c,ThreadID__c,Email_To__c,Renewals__c,Account_Owner_Profile__c,Region__c,Account_Manager__c,Account_Owner__c,One__c,tz__Local_Time_24_Short__c,tz__Local_Time_24__c,tz__Local_Time_Short__c,tz__Local_Time__c,tz__Timezone_Formula__c,tz__Timezone_Full__c,tz__UTC_Offset_Formula__c,bizible2__Ad_Campaign_Name_FT__c,bizible2__Ad_Campaign_Name_LC__c,bizible2__Landing_Page_FT__c,bizible2__Landing_Page_LC__c,bizible2__Marketing_Channel_FT__c,bizible2__Marketing_Channel_LC__c,bizible2__Touchpoint_Date_FT__c,bizible2__Touchpoint_Date_LC__c,bizible2__Touchpoint_Source_FT__c,bizible2__Touchpoint_Source_LC__c FROM Case

...

INFO Making POST request to https://na34.salesforce.com/services/async/41.0/job/7506100000FHvBUAA1/batch with body SELECT Id,AccountId,UserId,TeamMemberRole,PhotoUrl,Title,AccountAccessLevel,OpportunityAccessLevel,CaseAccessLevel,ContactAccessLevel,CreatedDate,CreatedById,LastModifiedDate,LastModifiedById,SystemModstamp,IsDeleted FROM AccountTeamMember

Also, from the raw output, I can see that the records for Case seem to be ordered by CreatedDate and start from 2016-04-18 and go forward.

/cc @jschatz1