Support `page` and `page_size` parameter in Text Analysis Export

DavidHuebner commented 2 years ago

Is your feature request related to a problem? Please describe. The product API has the two fields page and pageSize that can be used to limit the results. Currently, these arguments are not accepted by our API.

Describe the solution you'd like The function export_text_analysis(self, annotation_types: str = None) -> dict: should accept these two parameters.

Describe alternatives you've considered None

reckart commented 2 years ago

Should be page_size?

DavidHuebner commented 2 years ago

I added an implementation of page and page_size and tested it on 50 documents in a live product.

client = Client("localhost:8800")
project = client.get_project("test")
collection = project.get_document_collection("Codex_50Train")
process = collection.get_process("discharge")

print("Batchwise export documents (page_size=4):")
for page in range(1,15):
    print(f"Page (Batch) number {page}")
    out = process.export_text_analysis(page_size=4, page=page)
    for d in out['textAnalysisResultDtos']:
        print("Document name: " + d["documentName"])

It works as it should, only the order of the documents does not seem to follow any pattern.

Batchwise export documents (page_size=4):
Page (Batch) number 1
Document name: Arztbrief (10).txt
Document name: Arztbrief (14).txt
Document name: Arztbrief (11).txt
Document name: Arztbrief (17).txt
Page (Batch) number 2
Document name: Arztbrief (21).txt
Document name: Arztbrief (20).txt
Document name: Arztbrief (18).txt
Document name: Arztbrief (2).txt
Page (Batch) number 3
Document name: Arztbrief (29).txt
Document name: Arztbrief (31).txt
Document name: Arztbrief (26).txt
Document name: Arztbrief (23).txt
Page (Batch) number 4
Document name: Arztbrief (33).txt
Document name: Arztbrief (28).txt
Document name: Arztbrief (4).txt
Document name: Arztbrief (34).txt
Page (Batch) number 5
Document name: Arztbrief (41).txt
Document name: Arztbrief (43).txt
Document name: Arztbrief (47).txt
Document name: Arztbrief (44).txt
Page (Batch) number 6
Document name: Arztbrief (48).txt
Document name: Arztbrief (46).txt
Document name: Arztbrief (51).txt
Document name: Arztbrief (45).txt
Page (Batch) number 7
Document name: Arztbrief (59).txt
Document name: Arztbrief (53).txt
Document name: Arztbrief (6).txt
Document name: Arztbrief (52).txt
Page (Batch) number 8
Document name: Arztbrief (60).txt
Document name: Arztbrief (69).txt
Document name: Arztbrief (7).txt
Document name: Arztbrief (66).txt
Page (Batch) number 9
Document name: Arztbrief (71).txt
Document name: Arztbrief (73).txt
Document name: Arztbrief (74).txt
Document name: Arztbrief (72).txt
Page (Batch) number 10
Document name: Arztbrief (8).txt
Document name: Arztbrief (76).txt
Document name: Arztbrief (79).txt
Document name: Arztbrief (78).txt
Page (Batch) number 11
Document name: Arztbrief (90).txt
Document name: Arztbrief (85).txt
Document name: Arztbrief (84).txt
Document name: Arztbrief (83).txt
Page (Batch) number 12
Document name: Arztbrief (95).txt
Document name: Arztbrief (94).txt
Document name: Arztbrief (98).txt
Document name: Arztbrief (91).txt
Page (Batch) number 13
Document name: Arztbrief (99).txt
Document name: Arztbrief (96).txt
Page (Batch) number 14
Traceback (most recent call last):
  File "/home/huebner/src/python/averbis-python-api/tests/david2.py", line 18, in <module>
    out = process.export_text_analysis(page_size=4, page=page)
  File "/home/huebner/src/python/averbis-python-api/averbis/core/_rest_client.py", line 794, in export_text_analysis
    return self.project.client._export_text_analysis(
  File "/home/huebner/src/python/averbis-python-api/averbis/core/_rest_client.py", line 2047, in _export_text_analysis
    response = self.__request_with_json_response(
  File "/home/huebner/src/python/averbis-python-api/averbis/core/_rest_client.py", line 1412, in __request_with_json_response
    self.__handle_error(raw_response)
  File "/home/huebner/src/python/averbis-python-api/averbis/core/_rest_client.py", line 2489, in __handle_error
    raise RequestException(error_msg)
requests.exceptions.RequestException: 400 Server Error: 'Bad Request' for url: 'http://ecstasy.averbis.intern:8702/health-discovery/rest/v1/textanalysis/projects/test/documentSources/Codex_50Train/processes/discharge/export?page=14&pageSize=4'.
Endpoint error message is: 'The requested page '14' is bigger than the last page '13''

DavidHuebner commented 2 years ago

Closed after merge from https://github.com/averbis/averbis-python-api/pull/99

averbis / averbis-python-api

Support `page` and `page_size` parameter in Text Analysis Export #98