Originally created by @iroussos on 2018-12-12 14:04:02
Tap-salesforce documentation states that:
The start_date is used by the tap as a bound on SOQL queries when searching for records.
I have tested the tap-salesforce without using a state file and it seems like it is not using the start_date provided in config.json as a bound on the SOQL queries when searching for records. Not sure if the same problem is there when a state file is provided.
Setup:
Branch used: add-password-auth
Config file using PasswordCredentials with config["start_date"]="2018-12-01T00:00:00Z"
Two entities selected in properties.json: {Case, AccountTeamMember}
When I checked the tables in "RAW"."SALESFORCE" for everything already extracted, I can see :
SELECT count(*)
FROM "RAW"."SALESFORCE"."CASE"
WHERE LAST_MODIFIED_DATE > '2018-12-01';
>>> 356
SELECT count(*)
FROM "RAW"."SALESFORCE"."CASE"
WHERE CREATED_DATE > '2018-12-01';
>>> 237
It's clear that the tap extracted everything:
SELECT count(*)
FROM "RAW"."SALESFORCE"."CASE";
>>> 15092
SELECT count(*)
FROM "RAW"."SALESFORCE"."ACCOUNT_TEAM_MEMBER";
>>> 2287
I think that the following POST requests recorded in the tap-salesforce log are an indication that the start_date is not used. The queries reported in the body of the post requests have no where clause:
INFO Making POST request to https://na34.salesforce.com/services/async/41.0/job/7506100000FHv5DAAT/batch with body SELECT Id,IsDeleted,CaseNumber,ContactId,AccountId,SourceId,SuppliedName,SuppliedEmail,SuppliedPhone,SuppliedCompany,Type,Status,Reason,Origin,Subject,Priority,Description,IsClosed,ClosedDate,OwnerId,CreatedDate,CreatedById,LastModifiedDate,LastModifiedById,SystemModstamp,ContactPhone,ContactMobile,ContactEmail,ContactFax,LastViewedDate,LastReferencedDate,Notes__c,ThreadID__c,Email_To__c,Renewals__c,Account_Owner_Profile__c,Region__c,Account_Manager__c,Account_Owner__c,One__c,tz__Local_Time_24_Short__c,tz__Local_Time_24__c,tz__Local_Time_Short__c,tz__Local_Time__c,tz__Timezone_Formula__c,tz__Timezone_Full__c,tz__UTC_Offset_Formula__c,bizible2__Ad_Campaign_Name_FT__c,bizible2__Ad_Campaign_Name_LC__c,bizible2__Landing_Page_FT__c,bizible2__Landing_Page_LC__c,bizible2__Marketing_Channel_FT__c,bizible2__Marketing_Channel_LC__c,bizible2__Touchpoint_Date_FT__c,bizible2__Touchpoint_Date_LC__c,bizible2__Touchpoint_Source_FT__c,bizible2__Touchpoint_Source_LC__c FROM Case
...
INFO Making POST request to https://na34.salesforce.com/services/async/41.0/job/7506100000FHvBUAA1/batch with body SELECT Id,AccountId,UserId,TeamMemberRole,PhotoUrl,Title,AccountAccessLevel,OpportunityAccessLevel,CaseAccessLevel,ContactAccessLevel,CreatedDate,CreatedById,LastModifiedDate,LastModifiedById,SystemModstamp,IsDeleted FROM AccountTeamMember
Also, from the raw output, I can see that the records for Case seem to be ordered by CreatedDate and start from 2016-04-18 and go forward.
Migrated from GitLab: https://gitlab.com/meltano/tap-salesforce/-/issues/3
Originally created by @iroussos on 2018-12-12 14:04:02
Tap-salesforce documentation states that:
I have tested the tap-salesforce without using a state file and it seems like it is not using the
start_date
provided in config.json as a bound on the SOQL queries when searching for records. Not sure if the same problem is there when a state file is provided.Setup:
add-password-auth
The tap log clearly shows ~15000 records extracted for
Case
and 2255 records extracted forAccountTeamMember
:When I checked the tables in "RAW"."SALESFORCE" for everything already extracted, I can see :
It's clear that the tap extracted everything:
I think that the following POST requests recorded in the tap-salesforce log are an indication that the start_date is not used. The queries reported in the body of the post requests have no where clause:
Also, from the raw output, I can see that the records for Case seem to be ordered by CreatedDate and start from 2016-04-18 and go forward.
/cc @jschatz1