RuntimeJob.backend() does not show the correct backend a job was run on

mriedem commented 1 year ago

Describe the bug

I created a test where I submit 50 cloud channel jobs without specifying a backend, so the scheduler picks a backend. I wait for all jobs to complete - they al completed successfully. The jobs are spread across two possible backends. However, when I get the backend counts for the jobs it says they are all on the same backend:

from collections import defaultdict

backend_counts = defaultdict(int)
for job in jobs:
    backend_counts[job.backend()] += 1

print(f'backend distribution of {len(jobs)} jobs')
for backend, count in backend_counts.items():
    print(f'{backend}: {count}')

Output:

backend distribution of 50 jobs
<IBMBackend('alt_algiers')>: 50

In the database though, this is the distribution:

ibmclouddb=> select count(*),backend_name from jobs where tags @> '{concurrent-jobs}' and created_at > now() - interval '20 minutes' group by backend_name order by count;
 count | backend_name 
-------+--------------
     2 | alt_algiers
    48 | alt_canberra
(2 rows)

Steps to reproduce

This is my script that generates the jobs and waits for them to complete:

from concurrent import futures
import time

from qiskit_ibm_runtime import QiskitRuntimeService

service = QiskitRuntimeService()
program_inputs = {'iterations': 1}
# Tag the job for the MCS flow
options = {
    # Be aware of https://github.com/Qiskit/qiskit-ibm-runtime/issues/622 if
    # not specifying a backend.
    "backend": "",
    "job_tags": ["proxy_mcs", "use_runtime_workers", "concurrent-jobs"]
}

def submit_job(index):
    # copy job tags so we can add one for the index per job
    job_tags = list(options['job_tags'])
    job_tags.append(f'job{index}')
    my_options = dict(options)
    my_options['job_tags'] = job_tags
    return service.run(program_id="hello-world",
                       options=my_options,
                       inputs=program_inputs)

# submit jobs concurrently (runtime API rate limits to 50 jobs per minute per user)
count = 50
print(f'starting to submit {count} jobs concurrently')
start_time = time.time()
jobs = []
with futures.ThreadPoolExecutor(thread_name_prefix='mcs-integration-test') as executor:
    # submit the jobs asynchronously
    f = []
    for index in range(count):
        f.append(executor.submit(submit_job, index))
    # wait for the jobs to be created
    for future in futures.as_completed(f):
        try:
            job = future.result()
            jobs.append(job)
            print(f'submitted job: {job.job_id()}')
        except Exception as e:
            print(f'error submitting job: {e}')

# wait for the jobs to complete
print('waiting for jobs to complete')
for index, job in enumerate(jobs):
    job_id = job.job_id()
    print(f'{index+1}) waiting for job {job_id} to complete; status: {job.status()}')
    print(f'job {job_id} result: {job.result()}')
end_time = time.time()

print(f'time from jobs submitted to all jobs completed: {end_time - start_time}')

# how many failed?
print('seeing how many jobs failed')
errored_jobs = {}
for job in jobs:
    if job.error_message():
        errored_jobs[job.job_id()] = job.error_message()

if errored_jobs:
    print(f'{len(errored_jobs)} jobs failed')
elif not jobs:
    print('failed to submit all jobs')
else:
    print('all jobs passed')

Expected behavior

I would expect that once the jobs are complete, Qiskit would show me the actual backend they ran on.

Suggested solutions

Use the backend field from the actual GET /jobs/{id} response from the runtime API for the cloud channel:

https://us-east.quantum-computing.cloud.ibm.com/openapi/#/Jobs/get_job_details_jid

Additional Information

It seems that maybe the code is getting here:

https://github.com/Qiskit/qiskit-ibm-runtime/blob/5ef5711716b52a95b8a0211385f3b4450da82f74/qiskit_ibm_runtime/qiskit_runtime_service.py#L701

And just taking the first backend available, which between alt_algiers and alt_canberra is alt_algiers, just based on sort order, even though 48 of the 50 jobs ran on alt_canberra. However, I'm not sure how RuntimeJob.backend() gets to that QiskitRuntimeService.backends() method.

This may also be related to #622 in some way, i.e. if the jobs don't have a backend specified when they are created maybe that influences how this works.

qiskit-ibm-runtime version: 0.8.0
Python version: 3.8
Operating system: Ubuntu Focal

kt474 commented 1 year ago

Yeah, this is likely related to #622 because the runtimeJob.backend() method is inherited from the terra job class which just returns the backend (self._backend).

kt474 commented 1 year ago

@mriedem are you still running into this issue? There was a fix on the server side and job.backend() should always return the correct backend now.

mriedem commented 1 year ago

@mriedem are you still running into this issue? There was a fix on the server side and job.backend() should always return the correct backend now.

Yes, in some form I'm still hitting this.

With this program (using ibm_cloud channel):

from qiskit_ibm_runtime import QiskitRuntimeService

service = QiskitRuntimeService()
program_inputs = {'iterations': 1}
options = {
    "backend": "",
    "job_tags": ["rs-target"]
}

job = service.run(program_id="hello-world",
                  options=options,
                  inputs=program_inputs)
print(f'submitted job: {job.job_id()}')
print('waiting for results')
print(f'results: {job.result()}')
print(f'job ran on backend: {job.backend()}')

I got this output:

/tmp/ipykernel_59/3237960307.py:11: DeprecationWarning: Note that the 'job_id' and 'backend' attributes of a runtime job have been deprecated as of qiskit-ibm-runtime 0.7 and will be removed no sooner than 3 months after the release date. Please use the job_id() and backend() methods instead.
  job = service.run(program_id="hello-world",
submitted job: cg7jp5okl79prl7rjgb0
waiting for results
results: Hello, World!
job ran on backend: <IBMBackend('ibm_algiers')>

Note it says the job ran on ibm_algiers but internally I know the job ran on ibm_canberra.

Furthermore, when I get the job by ID and print out its backend it shows the correct one:

j = service.job(job.job_id())
print('status: %s' % j.status())
print('backend: %s' % j.backend())

I get this:

status: JobStatus.DONE
backend: <IBMBackend('ibm_canberra')>

So there is something wrong with the initial job returned from service.run where it defaults job.backend() to ibm_algiers, assuming because that's the first backend in the array of backends available in an attempt to sort out how there isn't a backend set on the job while it's queued, but that's the wrong behavior in the ibm_cloud channel scenario where backend isn't required when creating the job and the scheduler will pick the backend when the job runs.

mriedem commented 1 year ago

Yes, in some form I'm still hitting this.

Here are the versions of Qiskit I'm using in the program above:

qiskit                            0.39.2
qiskit-aer                        0.11.1
qiskit-experiments                0.4.0
qiskit-finance                    0.3.4
qiskit-ibm-experiment             0.2.8
qiskit-ibm-provider               0.1.0
qiskit-ibm-runtime                0.8.0
qiskit-ibmq-provider              0.19.2
qiskit-machine-learning           0.5.0
qiskit-nature                     0.5.0
qiskit-optimization               0.4.0
qiskit-terra                      0.22.2
qiskit-textbook                   0.1.0

@kt474 I see those are old in Quantum Lab, do you expect the fix you mentioned to be in a newer version of one of these packages?

mriedem commented 1 year ago

Tested with qiskit-ibm-runtime 0.9.1 and still hitting the same issue described above.

kt474 commented 1 year ago

Yeah this is a bug on the client side - I will work on a fix

Also I believe there is another related bug on the cloud side - if I don't choose a backend, canberra or algiers will be chosen but it looks like I don't actually have access to these backends. So then retrieving the job again gives a backend not found error.

raulotaolea commented 1 year ago

@kt474 which is the plan of the instance to which you are sending jobs? Lite or Standard?

merav-aharoni commented 1 year ago

@kt474 - when I try to run this example as-is, I get an error on the first line: service = QiskitRuntimeService(): Failed to establish a new connection: [Errno -2] Name or service not known')). If a change the line to service = QiskitRuntimeService(channel="ibm_quantum"), I get the following error: error submitting job: '"backend" is required field in "options" for "ibm_quantum" channel.', which makes this issue irrelevant. If a change the line to service = QiskitRuntimeService(channel="ibm_cloud"), I get the first error again.

Should I indeed run this with `channel="ibm_cloud"?
When I tried creating a cloud account, I followed the instructions up to the point where I got this message:
And I don't know what to do after this. Can you advise if I am going in the right direction, and if so, what next?

kt474 commented 1 year ago

Yeah, this bug only applies to the cloud channel, so service = QiskitRuntimeService(channel="ibm_cloud") is correct. In the cloud channel, the backend does not need to be specified because it will be automatically selected.

https://cloud.ibm.com/docs/quantum-computing?topic=quantum-computing-get-started should walk you through how to create a cloud account

merav-aharoni commented 1 year ago

Yeah, this bug only applies to the cloud channel, so service = QiskitRuntimeService(channel="ibm_cloud") is correct. In the cloud channel, the backend does not need to be specified because it will be automatically selected.

https://cloud.ibm.com/docs/quantum-computing?topic=quantum-computing-get-started should walk you through how to create a cloud account

I tried following the instructions above, but got stuck with the message I wrote in the comment above. So I am still stuck on step 1: create a service instance.

kt474 commented 1 year ago

So when a backend is not set, and a cloud channel job is run, the backend "passed in" is just the first backend from a the list of returned here

then, when the runtime job is initiated, the backend could be incorrect, making job.backend() incorrect here

At the moment, I'm not sure what the best solution is - maybe we could add a refresh to the backend() method the first time it is called

merav-aharoni commented 1 year ago

I see an additional problem that possibly comes from ntc-provider. When I check job.backend() before asking for the job.result(), I get None. After running job.result(), I get the correct backend. Here is the code:

j = service.job(job.job_id())
print("before getting result, backend =  " + str(j.backend()))
result = job.result()
print("after getting result, backend = " + str(j.backend()))

And here is the printout I get:

before getting result, backend =  None
after getting result, backend = <IBMBackend('ibm_canberra')>

merav-aharoni commented 1 year ago

@LucyXing - can you please have a look at my comment above regarding the values I am getting for job.backend()?

LucyXing commented 1 year ago

Hi @merav-aharoni, I wasn't able to reproduce the issue you had above. I was able to get the correct backend before and after calling job.result(). FYI @kt474

Qiskit / qiskit-ibm-runtime

RuntimeJob.backend() does not show the correct backend a job was run on #625