runSavedQueryByUrl only returns 34 results

MartinBarkerPhilips commented 2 years ago

Facing a problem where when I run a query from a URL I only get at max 34 responses. Is there a different way to run rtc queries with multithreading? If I try this method:

from datetime import datetime
import time

import RTC
rtc_client = RTC.rtcclient
query_client = rtc_client.query

print('Beginning RTC query')

start_time = time.time()      

query_url="https://rtcus1.ta.philips.com/ccm/web/projects/ULT#action=com.ibm.team.workitem.runSavedQuery&id=_ztYxxxxxxxxxxxxxxxxxMA"

returned_prop="dc:title,dc:identifier,rtc_cm:state,rtc_cm:ownedBy"

query_results = query_client.runSavedQueryByUrl(
    query_url,
    returned_prop
    )
    returned_properties='dc:type,dc:subject,dc:title,dc:description,dc:identifier,rtc_cm:ownedBy,rtc_cm:modifiedBy,rtc_cm:state,rtc_cm:targeted_release,oslc_cm:priority,dc:type,rtc_cm:plannedFor'
    )

duration=time.time() - start_time

print('Found '+ str(len(query_results)) + ' results in ' + str(round(duration, 2)) +' seconds.')
print('done')

my query url is for 200+ results, but only ever returns 34 in my query_results list

am i doing something wrong? tried to follow the instructions in the docs https://rtcclient.readthedocs.io/en/0.6.0/quickstart.html#query-workitems-by-saved-query-url

dixudx commented 2 years ago

Please help verify,

All the items in your query are not stale. I think rtcclient may abandon some stale items.
The paged content from the server may not be completed. Some data may be missing, which blocks rtcclient searching all the items.

MartinBarkerPhilips commented 2 years ago

Not sure how I can verify either your #1 or #2 point, but I did verify that for the query URL I am using, in the web UI it says there are 170 results:

1) Clone repo, checkout to release-0.6, pull. Add this code to base.py before get request at line 70 to support my companies older RTC version:

        #####################################################
        requests.packages.urllib3.disable_warnings()
        requests.packages.urllib3.util.ssl_.DEFAULT_CIPHERS += ':HIGH:!DH:!aNULL'
        try:
            requests.packages.urllib3.contrib.pyopenssl.util.ssl_.DEFAULT_CIPHERS += ':HIGH:!DH:!aNULL'
        except AttributeError:
            # no pyopenssl support used / needed / available
            pass
        ###################################################

2) Uninstall/reinstall pip package:

pip uninstall rtcclient
pip install -e ./rtcclient-version0.6.0custom/

3) Run Python command to open python command line, login to RTC and run RTCCLIENT query saved url:

from rtcclient.utils import setup_basic_logging

from rtcclient import RTCClient

setup_basic_logging()

myclient = RTCClient("https://MYCOMPANIESRTCURL.com/ccm", "USERNAME", "PASSWORD", ends_with_jazz=False)

ISD_Project_Area = myclient.getProjectArea(projectarea_name="ULT")

myquery = myclient.query

print(len(myquery.runSavedQueryByUrl("https://MYCOMPANIESRTCURL.com/ccm/web/projects/ULT#action=com.ibm.team.workitem.runSavedQuery&id=_1tBjsLIDEeyOjuoonCu3MA", returned_properties="dc:title,dc:identifier,rtc_cm:state,rtc_cm:ownedBy")))

Doing this will print out 70 to show it returned 70 results, but my web view of the same query url shows 170? Not sure why work items are missing.

it is alot faster though, so the multithreading is there, maybe the multiple different threads handling the queries aren't getting combined and returned back to me correctly?

MartinBarkerPhilips commented 2 years ago

if I run these same commands with my own rtcclient repo based on v6.0 without the multithreading addition, runSavedQueryByUrl will successfully return all 170 results at once

dixudx commented 2 years ago

cc @casabre Multi-threading feature did lost some entries. Please help check and fix it. Thanks.

casabre commented 2 years ago

@dixudx @MartinBarkerPhilips I am running the version 0.8.1 successfully in my companies web services. Our query has around 90 - 100 items and I am getting them all back. Our IBM client is running on EWM 7.0.1.

Thus, I am not seeing any indication that the multi-threading itself could lead to wrong results. The routine just delegates the _handle_resource_entry parsing to a sub-thread. Everything stays in the same RAM-space, no forking or pickling which could lead to wrong behavior. Finally, if the parsing returns None, than it is filtered out. But a None return could also occur without a multi-threading implementation.

Out of the information, I am not sure if we are comparing apples and oranges because we are comparing version 0.6.0 with 0.8.1. The version 0.7.0 is left out but introduces many changes which are the basis for 0.8.1. Thus, I am not sure if multi-threading is the culprit or any change in 0.7.0. As @MartinBarkerPhilips mentioned, he is adding a custom implementation in order to make the client compatible with their older corporate's JAZZ version. Maybe some REST API changes which could lead to wrong behavior?

@MartinBarkerPhilips could you please validate your routine against 0.7.0 and check if the result is the same as with 0.6.0.

MartinBarkerPhilips commented 2 years ago

I can check my query results using rtcclient v 0.7.0 , but I will probably have to make changes to the code because I am running an older version of rtcclient.

MartinBarkerPhilips commented 2 years ago

but I am going to begin looking into this issue again, which to summarize is about getting multi-threading support for my older rtc version, because the current rtc query time is so so so long

MartinBarkerPhilips commented 2 years ago

@casabre I tired to use version v0.7.0, but I was unable to authenticate. Then I testing with running v0.60 and it worked fine. So far, v0.6.0 is the only tag that has worked with my older version of RTC. Just to summarize, my issue is that the custom branch of v0.6.0 with multithreading has not been working for me, because it is returning an incorrect number of work items when I query a url. I would really love to have multithreading support! right now v0.6.0 takes very long

See the steps I ran below:

download rtcclient v0.7.0 locally

$ git clone https://github.com/dixudx/rtcclient.git rtcclient0.7.0
$ cd rtcclient0.7.0
$ git checkout 0.7.0
$ git fetch $ git pull origin 0.7.0

install rtcclient v0.7.0 locally, verify version

$ pip uninstall rtcclient
$ pip install -e rtcclient0.7.0
$ pip list

I verify it says "rtcclient 0.7.0 c:\users\username\documents\repos\rtcclient0.7.0", so i have rtcclient v0.7.0 installed

Test authenticating with rtcclient / prep setup

$ python
>>> from rtcclient.utils import setup_basic_logging
>>> from rtcclient import RTCClient
>>> setup_basic_logging()   
>>> myclient = RTCClient("https://rtcus1.ta.philips.com/ccm", "username", "pword", ends_with_jazz=False)

Got error:

requests.exceptions.SSLError: HTTPSConnectionPool(host='rtcus1.ta.philips.com', port=443): Max retries exceeded with url: /ccm/authenticated/identity (Caused by SSLError(SSLError(1, '[SSL: DH_KEY_TOO_SMALL] dh key too small (_ssl.c:997)')))

Edit file "rtcclient0.7.0/rtcclient/base.py"

add the following code before line 77 `response = requests.get(url, verify=verify, headers=headers,proxies=proxies, timeout=timeout, **kwargs)`

```
requests.packages.urllib3.disable_warnings()
requests.packages.urllib3.util.ssl_.DEFAULT_CIPHERS += ':HIGH:!DH:!aNULL'
try:
    requests.packages.urllib3.contrib.pyopenssl.util.ssl_.DEFAULT_CIPHERS += ':HIGH:!DH:!aNULL'
except AttributeError:
    ### no pyopenssl support used / needed / available
    pass
```

Save file, uninstall and reinstall rtcclient

$ pip uninstall rtcclient
$ pip install -e rtcclient0.7.0

Try again

$ python
>>> from rtcclient.utils import setup_basic_logging
>>> from rtcclient import RTCClient
>>> setup_basic_logging()   
>>> myclient = RTCClient("https://rtcus1.ta.philips.com/ccm", "username", "pword", ends_with_jazz=False)

Now the command completes, no error, could 'failed get request' lines logged out though, but we continue running commands

>>> ISD_Project_Area = myclient.getProjectArea(projectarea_name="ULT")

Fails with error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\users\320178017\documents\repos\rtcclient0.7.0\rtcclient\client.py", line 176, in getProjectArea
    proj_areas = self._getProjectAreas(archived=archived,
  File "c:\users\320178017\documents\repos\rtcclient0.7.0\rtcclient\client.py", line 203, in _getProjectAreas
    return self._get_paged_resources("ProjectArea",
  File "c:\users\320178017\documents\repos\rtcclient0.7.0\rtcclient\client.py", line 1355, in _get_paged_resources
    raw_data = xmltodict.parse(resp.content)
  File "C:\Users\320178017\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\xmltodict.py", line 327, in parse
    parser.Parse(xml_input, True)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 17, column 77

Now test with the only version of rtcclient which has worked for me so far, v0.6.0

$ git clone https://github.com/dixudx/rtcclient.git rtcclient0.6.0
$ cd rtcclient0.6.0
$ git checkout 0.6.0
$ git fetch $ git pull origin 0.6.0

install

$ pip uninstall rtcclient
$ pip install -e rtcclient0.6.0
$ pip list

i verify it says version 0.6.0

$ python
>>> from rtcclient.utils import setup_basic_logging
>>> from rtcclient import RTCClient
>>> setup_basic_logging()   
>>> myclient = RTCClient("https://rtcus1.ta.philips.com/ccm", "username", "pword", ends_with_jazz=False)

fails with err

requests.exceptions.SSLError: HTTPSConnectionPool(host='rtcus1.ta.philips.com', port=443): Max retries exceeded with url: /ccm/authenticated/identity (Caused by SSLError(SSLError(1, '[SSL: DH_KEY_TOO_SMALL] dh key too small (_ssl.c:997)')))

add same fix to "rtcclient0.6.0/rtcclient/base.py"

uninstall / reinstall to get updated v0.6.0

rerun $ Python commands

$ Python
>>> from rtcclient.utils import setup_basic_logging
>>> from rtcclient import RTCClient
>>> setup_basic_logging()   
>>> myclient = RTCClient("https://rtcus1.ta.philips.com/ccm", "username", "pword", ends_with_jazz=False)

command succeeds with more output then I had with the v0.7.0

>>> myquery = myclient.query

Test by running a query, should return x number of results

>>> print(len(myquery.runSavedQueryByUrl("https://rtcus1.ta.philips.com/ccm/web/projects/ULT#action=com.ibm.team.workitem.runSavedQuery&id=_1tBjsLIDEeyOjuoonCu3MA", returned_properties="dc:title,dc:identifier,rtc_cm:state,rtc_cm:ownedBy")))

should return 167 results and it does just fine

2022-06-09 07:46:10,246 DEBUG client.RTCClient: Successfully fetching all the paged resources
167

dixudx commented 2 years ago

Try to see whether this works.

pip install --force-reinstall git+https://github.com/dixudx/rtcclient.git@release-0.6

dixudx commented 1 year ago

@MartinBarkerPhilips Please try latest version 0.8.0. Thanks.

dixudx / rtcclient