forcedotcom / cli

Salesforce CLI
https://developer.salesforce.com/docs/atlas.en-us.sfdx_cli_reference.meta/sfdx_cli_reference/
BSD 3-Clause "New" or "Revised" License
494 stars 78 forks source link

sf data query resume: cannot find resource for bulk query results in cli #2968

Closed deployment-ian closed 2 months ago

deployment-ian commented 3 months ago

Summary

I am using @salesforce/cli/2.50.6 win32-x64 node-v20.15.0

I am trying to extract data using a bulk query. This is a significant amount of data in this query request:

C:\Users\{user}\Desktop>sf data query --query "{query}" --target-org {org} --bulk --async
Query ID: {job_id}
Query is in progress.

Run "data query resume -i {job_id} -o {org}" to get the latest status and results.

Total number of records retrieved: 0.
Querying Data... done

I then checked the status of the job:

C:\Users\{user}\Desktop>sf data resume -i {job_id} -o {org}

=== Job Status

id:                      {job_id}
operation:               query
object:                  Account
createdById:             {user_id}
createdDate:             2024-07-24T12:34:14.000Z
systemModstamp:          2024-07-24T12:43:00.000Z
state:                   JobComplete
concurrencyMode:         Parallel
contentType:             CSV
numberBatchesQueued:     0
numberBatchesInProgress: 0
numberBatchesCompleted:  92
numberBatchesFailed:     0
numberBatchesTotal:      92
numberRecordsProcessed:  21165218
numberRetries:           0
apiVersion:              61.0
numberRecordsFailed:     0
totalProcessingTime:     2304433
apiActiveProcessingTime: 2304433
apexProcessingTime:      0
Getting Status... done

Great! It ran successfully. But when I try to extract the data, I receive this information that the resource does not exist:

C:\Users\{user}\Desktop>sf data query resume -i {job_id} -o {org} -r csv  > test.csv
Error (1): The requested resource does not exist

Where did the resource go? How do I extract these batch results into a csv?

System Information

{
  "architecture": "win32-x64",
  "cliVersion": "@salesforce/cli/2.50.6",
  "nodeVersion": "node-v20.15.0",
  "osVersion": "Windows_NT 10.0.19045",
  "rootPath": "C:\\Users\\{user}\\AppData\\Local\\sf\\client\\2.50.6-6e9d7f7",
  "shell": "cmd.exe",
  "pluginVersions": [
    "@oclif/plugin-autocomplete 3.1.6 (core)",
    "@oclif/plugin-commands 4.0.6 (core)",
    "@oclif/plugin-help 6.2.5 (core)",
    "@oclif/plugin-not-found 3.2.10 (core)",
    "@oclif/plugin-plugins 5.3.4 (core)",
    "@oclif/plugin-search 1.2.2 (core)",
    "@oclif/plugin-update 4.4.7 (core)",
    "@oclif/plugin-version 2.2.6 (core)",
    "@oclif/plugin-warn-if-update-available 3.1.8 (core)",
    "@oclif/plugin-which 3.2.6 (core)",
    "@salesforce/cli 2.50.6 (core)",
    "apex 3.2.4 (core)",
    "auth 3.6.33 (core)",
    "data 3.5.5 (core)",
    "deploy-retrieve 3.9.13 (core)",
    "env 3.0.32 (user) published 41 days ago (Wed Jun 12 2024)",
    "info 3.3.17 (core)",
    "limits 3.3.17 (core)",
    "marketplace 1.2.17 (core)",
    "org 4.3.4 (core)",
    "packaging 2.7.0 (core)",
    "schema 3.3.19 (core)",
    "settings 2.3.8 (core)",
    "sobject 1.4.19 (core)",
    "source 3.5.3 (core)",
    "telemetry 3.6.1 (core)",
    "templates 56.3.2 (core)",
    "trust 3.7.13 (core)",
    "user 3.5.17 (core)"
  ]
}
github-actions[bot] commented 3 months ago

Thank you for filing this issue. We appreciate your feedback and will review the issue as soon as possible. Remember, however, that GitHub isn't a mechanism for receiving support under any agreement or SLA. If you require immediate assistance, contact Salesforce Customer Support.

mdonnalley commented 3 months ago

@deployment-ian thanks for the issue

The requested resource does not exist is directly from the API. In your case, I think it might indicate that you queried data that you don't have permission to view? For example, one of the Accounts returned in the query might be set to private

deployment-ian commented 3 months ago

I don't believe this is to be the case. When I adjust my query so that it only returns 1.8m records (down from 21m), the command C:\Users\{user}\Desktop>sf data query resume -i {job_id} -o {org} -r csv > test.csv functions as expected and I receive the data

deployment-ian commented 3 months ago

Also, when I utilize workbench with the same query as I use and the resource doesn't appear, workbench is able to extract al of the data

mdonnalley commented 3 months ago

Got it. Can you set JSFORCE_LOG_LEVEL=DEBUG in your environment and share the output from the failing command? I'm hoping that will give us some insight as to where the API is returning resource not found - thanks!

deployment-ian commented 3 months ago

sure!

I set the flag, received lots of additional information, but no extra detail when executing the sf data query resume command:

C:\Users\{user}\Desktop>set JSFORCE_LOG_LEVEL=DEBUG

C:\Users\Ian{user}\Desktop>sf data query --query "{query}" --target-org {org} --bulk --async
DEBUG   [http-api]  missing 'content-length' header, setting it to: 301
DEBUG   [http-api]  <request> method=POST, url=https://officedepot.my.salesforce.com/services/data/v61.0/jobs/query
DEBUG   [http-api]  elapsed time: 238 msec
DEBUG   [http-api]  <response> status=401, url=https://officedepot.my.salesforce.com/services/data/v61.0/jobs/query
DEBUG   [bulk2:QueryJobV2]  sforce-locator: undefined
INFO    [session-refresh-delegate]  <refresh token>
DEBUG   [session-refresh-delegate]  Connection refresh completed.
INFO    [session-refresh-delegate]  <refresh complete>
DEBUG   [http-api]  <request> method=POST, url=https://officedepot.my.salesforce.com/services/data/v61.0/jobs/query
DEBUG   [http-api]  elapsed time: 67 msec
DEBUG   [http-api]  <response> status=401, url=https://officedepot.my.salesforce.com/services/data/v61.0/jobs/query
DEBUG   [bulk2:QueryJobV2]  sforce-locator: undefined
INFO    [session-refresh-delegate]  <refresh token>
DEBUG   [session-refresh-delegate]  Connection refresh completed.
INFO    [session-refresh-delegate]  <refresh complete>
DEBUG   [http-api]  <request> method=POST, url=https://officedepot.my.salesforce.com/services/data/v61.0/jobs/query
DEBUG   [http-api]  elapsed time: 376 msec
DEBUG   [http-api]  <response> status=200, url=https://officedepot.my.salesforce.com/services/data/v61.0/jobs/query
DEBUG   [bulk2:QueryJobV2]  sforce-locator: undefined
DEBUG   [bulk2:QueryJobV2]  Successfully created job 750Hn00000OUfXPIA1
Query ID: 750Hn00000OUfXPIA1
Query is in progress.

Run "data query resume -i 750Hn00000OUfXPIA1-o {user}" to get the latest status and results.

Total number of records retrieved: 0.
Querying Data... done

C:\Users\{user}\Desktop>sf data resume -i 750Hn00000OUfXPIA1 -o {org} --json
DEBUG   [http-api]  <request> method=GET, url=https://officedepot.my.salesforce.com/services/async/61.0/job/750Hn00000OUfXPIA1
DEBUG   [http-api]  elapsed time: 998 msec
DEBUG   [http-api]  <response> status=200, url=https://officedepot.my.salesforce.com/services/async/61.0/job/750Hn00000OUfXPIA1
DEBUG   [connection]  {
  '$': { xmlns: 'http://www.force.com/2009/06/asyncapi/dataload' },
  id: '750Hn00000OUfXPIA1',
  operation: 'query',
  object: 'Account',
  createdById: '{user_id}',
  createdDate: '2024-07-24T19:04:32.000Z',
  systemModstamp: '2024-07-24T19:04:33.000Z',
  state: 'InProgress',
  concurrencyMode: 'Parallel',
  contentType: 'CSV',
  numberBatchesQueued: '0',
  numberBatchesInProgress: '6',
  numberBatchesCompleted: '10',
  numberBatchesFailed: '0',
  numberBatchesTotal: '16',
  numberRecordsProcessed: '2373669',
  numberRetries: '0',
  apiVersion: '61.0',
  numberRecordsFailed: '0',
  totalProcessingTime: '134861',
  apiActiveProcessingTime: '134861',
  apexProcessingTime: '0'
}
{
  "status": 0,
  "result": {
    "$": {
      "xmlns": "http://www.force.com/2009/06/asyncapi/dataload"
    },
    "id": "750Hn00000OUfXPIA1",
    "operation": "query",
    "object": "Account",
    "createdById": "{user_id}",
    "createdDate": "2024-07-24T19:04:32.000Z",
    "systemModstamp": "2024-07-24T19:04:33.000Z",
    "state": "InProgress",
    "concurrencyMode": "Parallel",
    "contentType": "CSV",
    "numberBatchesQueued": "0",
    "numberBatchesInProgress": "6",
    "numberBatchesCompleted": "10",
    "numberBatchesFailed": "0",
    "numberBatchesTotal": "16",
    "numberRecordsProcessed": "2373669",
    "numberRetries": "0",
    "apiVersion": "61.0",
    "numberRecordsFailed": "0",
    "totalProcessingTime": "134861",
    "apiActiveProcessingTime": "134861",
    "apexProcessingTime": "0"
  },
  "warnings": []
}

C:\Users\{user}\Desktop>sf data resume -i 750Hn00000OUfXPIA1 -o {org} --json
DEBUG   [http-api]  <request> method=GET, url=https://officedepot.my.salesforce.com/services/async/61.0/job/750Hn00000OUfXPIA1
DEBUG   [http-api]  elapsed time: 1178 msec
DEBUG   [http-api]  <response> status=200, url=https://officedepot.my.salesforce.com/services/async/61.0/job/750Hn00000OUfXPIA1
DEBUG   [connection]  {
  '$': { xmlns: 'http://www.force.com/2009/06/asyncapi/dataload' },
  id: '750Hn00000OUfXPIA1',
  operation: 'query',
  object: 'Account',
  createdById: '{user_id}',
  createdDate: '2024-07-24T19:04:32.000Z',
  systemModstamp: '2024-07-24T19:08:52.000Z',
  state: 'JobComplete',
  concurrencyMode: 'Parallel',
  contentType: 'CSV',
  numberBatchesQueued: '0',
  numberBatchesInProgress: '0',
  numberBatchesCompleted: '92',
  numberBatchesFailed: '0',
  numberBatchesTotal: '92',
  numberRecordsProcessed: '21169518',
  numberRetries: '0',
  apiVersion: '61.0',
  numberRecordsFailed: '0',
  totalProcessingTime: '1217556',
  apiActiveProcessingTime: '1217556',
  apexProcessingTime: '0'
}
{
  "status": 0,
  "result": {
    "$": {
      "xmlns": "http://www.force.com/2009/06/asyncapi/dataload"
    },
    "id": "750Hn00000OUfXPIA1",
    "operation": "query",
    "object": "Account",
    "createdById": "{user_id}",
    "createdDate": "2024-07-24T19:04:32.000Z",
    "systemModstamp": "2024-07-24T19:08:52.000Z",
    "state": "JobComplete",
    "concurrencyMode": "Parallel",
    "contentType": "CSV",
    "numberBatchesQueued": "0",
    "numberBatchesInProgress": "0",
    "numberBatchesCompleted": "92",
    "numberBatchesFailed": "0",
    "numberBatchesTotal": "92",
    "numberRecordsProcessed": "21169518",
    "numberRetries": "0",
    "apiVersion": "61.0",
    "numberRecordsFailed": "0",
    "totalProcessingTime": "1217556",
    "apiActiveProcessingTime": "1217556",
    "apexProcessingTime": "0"
  },
  "warnings": []
}

C:\Users\{user}\Desktop>sf data query resume -i {query_id}-o {org} > test.csv
Error (1): The requested resource does not exist
mdonnalley commented 3 months ago

C:\Users\{user}\Desktop>sf data query resume -i {query_id}-o {org} > test.csv will redirect all the stdout to test.csv - so that file should contain all the logs from that command

deployment-ian commented 3 months ago

duh!

"DEBUG  [bulk2:QueryJobV2]  Start polling for job status"   
"DEBUG  [bulk2:QueryJobV2]  Polling options: timeout:1000ms | interval: 0ms."   
"DEBUG  [http-api]  <request> method=GET"    url=https://officedepot.my.salesforce.com/services/data/v61.0/jobs/query/750Hn00000OUfXPIA1
"DEBUG  [http-api]  elapsed time: 585 msec" 
"DEBUG  [http-api]  <response> status=200"   url=https://officedepot.my.salesforce.com/services/data/v61.0/jobs/query/750Hn00000OUfXPIA1
"DEBUG  [bulk2:QueryJobV2]  sforce-locator: undefined"  
"DEBUG  [bulk2:QueryJobV2]  Job 750Hn00000OUfXPIA1 was successfully processed." 
"DEBUG  [http-api]  <request> method=GET"    url=https://officedepot.my.salesforce.com/services/data/v61.0/jobs/query/750Hn00000OUfXPIA1/results
"DEBUG  [http-api]  elapsed time: 16135 msec"   
"DEBUG  [http-api]  <response> status=200"   url=https://officedepot.my.salesforce.com/services/data/v61.0/jobs/query/750Hn00000OUfXPIA1/results
"DEBUG  [bulk2:QueryJobV2]  sforce-locator: MDowR2FIbjAwMDAwNXMyVDRLQUk6MQ" 
"DEBUG  [http-api]  <request> method=GET"    url=https://officedepot.my.salesforce.com/services/data/v61.0/jobs/query//750Hn00000OUfXPIA1/results?locator=MDowR2FIbjAwMDAwNXMyVDRLQUk6MQ
"DEBUG  [http-api]  elapsed time: 67 msec"  
"DEBUG  [http-api]  <response> status=404"   url=https://officedepot.my.salesforce.com/services/data/v61.0/jobs/query//750Hn00000OUfXPIA1/results?locator=MDowR2FIbjAwMDAwNXMyVDRLQUk6MQ
"DEBUG  [bulk2:QueryJobV2]  sforce-locator: undefined"  
deployment-ian commented 3 months ago

Here's the data as JSON:

DEBUG   [bulk2:QueryJobV2]  Start polling for job status
DEBUG   [bulk2:QueryJobV2]  Polling options: timeout:1000ms | interval: 0ms.
DEBUG   [http-api]  <request> method=GET, url=https://officedepot.my.salesforce.com/services/data/v61.0/jobs/query/750Hn00000OUfXPIA1
DEBUG   [http-api]  elapsed time: 530 msec
DEBUG   [http-api]  <response> status=200, url=https://officedepot.my.salesforce.com/services/data/v61.0/jobs/query/750Hn00000OUfXPIA1
DEBUG   [bulk2:QueryJobV2]  sforce-locator: undefined
DEBUG   [bulk2:QueryJobV2]  Job 750Hn00000OUfXPIA1 was successfully processed.
DEBUG   [http-api]  <request> method=GET, url=https://officedepot.my.salesforce.com/services/data/v61.0/jobs/query/750Hn00000OUfXPIA1/results
DEBUG   [http-api]  elapsed time: 17000 msec
DEBUG   [http-api]  <response> status=200, url=https://officedepot.my.salesforce.com/services/data/v61.0/jobs/query/750Hn00000OUfXPIA1/results
DEBUG   [bulk2:QueryJobV2]  sforce-locator: MDowR2FIbjAwMDAwNXMyVDRLQUk6MQ
DEBUG   [http-api]  <request> method=GET, url=https://officedepot.my.salesforce.com/services/data/v61.0/jobs/query//750Hn00000OUfXPIA1/results?locator=MDowR2FIbjAwMDAwNXMyVDRLQUk6MQ
DEBUG   [http-api]  elapsed time: 73 msec
DEBUG   [http-api]  <response> status=404, url=https://officedepot.my.salesforce.com/services/data/v61.0/jobs/query//750Hn00000OUfXPIA1/results?locator=MDowR2FIbjAwMDAwNXMyVDRLQUk6MQ
DEBUG   [bulk2:QueryJobV2]  sforce-locator: undefined
{
  "name": "NOT_FOUND",
  "message": "The requested resource does not exist",
  "exitCode": 1,
  "context": "BulkQueryReport",
  "stack": "NOT_FOUND: The requested resource does not exist\n    at SfCommandError.from (file:///C:/Users/Ian-Bates/AppData/Local/sf/client/2.50.6-6e9d7f7/node_modules/@salesforce/sf-plugins-core/lib/SfCommandError.js:39:16)\n    at BulkQueryReport.catch (file:///C:/Users/Ian-Bates/AppData/Local/sf/client/2.50.6-6e9d7f7/node_modules/@salesforce/sf-plugins-core/lib/sfCommand.js:315:47)\n    at BulkQueryReport._run (C:\\Users\\Ian-Bates\\AppData\\Local\\sf\\client\\2.50.6-6e9d7f7\\node_modules\\@oclif\\core\\lib\\command.js:306:29)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Config.runCommand (C:\\Users\\Ian-Bates\\AppData\\Local\\sf\\client\\2.50.6-6e9d7f7\\node_modules\\@oclif\\core\\lib\\config\\config.js:424:25)\n    at async run (C:\\Users\\Ian-Bates\\AppData\\Local\\sf\\client\\2.50.6-6e9d7f7\\node_modules\\@oclif\\core\\lib\\main.js:95:16)\n    at async file:///C:/Users/Ian-Bates/AppData/Local/sf/client/2.50.6-6e9d7f7/bin/run.js:15:1",
  "cause": "NOT_FOUND: The requested resource does not exist\n    at BulkApiV2.getError (C:\\Users\\Ian-Bates\\AppData\\Local\\sf\\client\\2.50.6-6e9d7f7\\node_modules\\@jsforce\\jsforce-node\\lib\\http-api.js:279:16)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async C:\\Users\\Ian-Bates\\AppData\\Local\\sf\\client\\2.50.6-6e9d7f7\\node_modules\\@jsforce\\jsforce-node\\lib\\http-api.js:127:33\n    at async QueryJobV2.result (C:\\Users\\Ian-Bates\\AppData\\Local\\sf\\client\\2.50.6-6e9d7f7\\node_modules\\@jsforce\\jsforce-node\\lib\\api\\bulk2.js:359:13)\n    at async BulkQueryReport.run (file:///C:/Users/Ian-Bates/AppData/Local/sf/client/2.50.6-6e9d7f7/node_modules/@salesforce/plugin-data/lib/commands/data/query/resume.js:50:25)\n    at async BulkQueryReport._run (C:\\Users\\Ian-Bates\\AppData\\Local\\sf\\client\\2.50.6-6e9d7f7\\node_modules\\@oclif\\core\\lib\\command.js:302:22)\n    at async Config.runCommand (C:\\Users\\Ian-Bates\\AppData\\Local\\sf\\client\\2.50.6-6e9d7f7\\node_modules\\@oclif\\core\\lib\\config\\config.js:424:25)\n    at async run (C:\\Users\\Ian-Bates\\AppData\\Local\\sf\\client\\2.50.6-6e9d7f7\\node_modules\\@oclif\\core\\lib\\main.js:95:16)\n    at async file:///C:/Users/Ian-Bates/AppData/Local/sf/client/2.50.6-6e9d7f7/bin/run.js:15:1 {\n  errorCode: 'NOT_FOUND',\n  content: undefined\n}",
  "warnings": [],
  "code": "1",
  "status": 1,
  "commandName": "BulkQueryReport"
}
mdonnalley commented 3 months ago

@deployment-ian Thanks for the logs!

Based on these logs,

DEBUG   [http-api]  <request> method=GET, url=https://officedepot.my.salesforce.com/services/data/v61.0/jobs/query/750Hn00000OUfXPIA1/results
DEBUG   [http-api]  elapsed time: 17000 msec
DEBUG   [http-api]  <response> status=200, url=https://officedepot.my.salesforce.com/services/data/v61.0/jobs/query/750Hn00000OUfXPIA1/results
DEBUG   [bulk2:QueryJobV2]  sforce-locator: MDowR2FIbjAwMDAwNXMyVDRLQUk6MQ
DEBUG   [http-api]  <request> method=GET, url=https://officedepot.my.salesforce.com/services/data/v61.0/jobs/query//750Hn00000OUfXPIA1/results?locator=MDowR2FIbjAwMDAwNXMyVDRLQUk6MQ
DEBUG   [http-api]  elapsed time: 73 msec
DEBUG   [http-api]  <response> status=404, url=https://officedepot.my.salesforce.com/services/data/v61.0/jobs/query//750Hn00000OUfXPIA1/results?locator=MDowR2FIbjAwMDAwNXMyVDRLQUk6MQ
DEBUG   [bulk2:QueryJobV2]  sforce-locator: undefined

It looks like jsforce is creating a url with a // in it, which is causing the 404. For what it's worth, we fixed a similar issue last week

I'll mark this as a bug and get it on our board.

git2gus[bot] commented 3 months ago

This issue has been linked to a new work item: W-16325607

deployment-ian commented 3 months ago

Thank you so much @mdonnalley! I really appreciate it.

And since we have this conversation going, is there any way batch-download the CSV? or will this work-flow always return all this data within 1 csv?

I don't see much documentation utilizing the bulk api 2.0 commands within the cli...

I appreciate you helping me out with this!

mdonnalley commented 3 months ago

And since we have this conversation going, is there any way batch-download the CSV? or will this work-flow always return all this data within 1 csv?

There's no support for batch downloading the csv - it's always going to go into a single csv file. If you needed to do that, you could adjust your soql query to return a subset of the data and then run multiple sf data query --bulk with that

cristiand391 commented 3 months ago

And since we have this conversation going, is there any way batch-download the CSV? or will this work-flow always return all this data within 1 csv?

hi 👋🏼 @deployment-ian We are discussing some new commands for bulk import/export with the team, would be interested to hear your use-case for splitting results into multiple files (you can post it here: https://github.com/forcedotcom/cli/discussions/categories/ideas)

FWIW, the Bulk 2.0 API doesn't let clients to batch data when uploading, so for a feature like this the CLI would need to implement some extra logic, see: https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/bulk_common_diff_two_versions.htm

deployment-ian commented 3 months ago

Hi @cristiand391,

I don't know right now if it's actually better for me to have a multiple-file result from the bulk query at this time. Once the issue above it resolved, I can then figure out if it would be beneficial to request this feature.

I will post my thoughts on the pages you provided here.

Thanks!