axa-group / Parsr

Transforms PDF, Documents and Images into Enriched Structured Data
Apache License 2.0
5.81k stars 310 forks source link

NameError and infinite loop in the python client (PIP package) #447

Open mkosturek opened 4 years ago

mkosturek commented 4 years ago

Summary Function send_document throws NameError: name 'file' is not defined in when wait_till_finished=True and silent=False. When wait_till_finished=True and silent=True it falls into an infinite loop instead.

I can see, that on master branch that bug has been probably fixed already, however current PIP package is not up-to-date with these changes.

Steps To Reproduce

  1. pip install parsr-client
  2. In Python shell:
    
    >>> from parsr_client import ParsrClient
    >>> parsr = ParsrClient("localhost:3001")
    >>> request = parsr.send_document("someDocument.pdf", 
    config_path="config.json", document_name="someDocument", 
    wait_till_finished=True, silent=False)
    > Polling server for the job <job-id>...
    >> Progress percentage: 0
    >> Job done!
    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    <timed exec> in <module>

/opt/anaconda/miniconda3/envs/docspy37/lib/python3.8/site-packages/parsr_client/parsr_client.py in send_document(self, file_path, config_path, server, document_name, revision, wait_till_finished, refresh_period, save_request_id, silent) 148 print('>> Job done!') 149 return { --> 150 'file': file_path, 151 'config': config, 152 'status_code': r.status_code,

NameError: name 'file' is not defined

3. In Python shell:
```python
>>> from parsr_client import ParsrClient
>>> parsr = ParsrClient("localhost:3001")
>>> request = parsr.send_document("someDocument.pdf", 
    config_path="config.json", document_name="someDocument", 
    wait_till_finished=True, silent=True)
> Polling server for the job <job-id>...
# infinite loop, not finishing after document has been processed by server

Expected behavior Works without error and infinite looping

royjohal commented 4 years ago

Thanks @jvalls-axa

@mkosturek : You're right, there has been an update on the function send_document on the python client but the changes are still in the develop branch - they will be merged into the master branch upon the very next minor release. Here is the current signature of the function: https://github.com/axa-group/Parsr/blob/69e6b9bf33f1cc43d5a87d428cedf1132ccc48e8/clients/python-client/parsr_client/parsr_client.py#L73-L83 TLDR: The file argument has been renamed to file_path to avoid using the reserved python keyword file.

Thanks for pointing that out; from here on in, we'll try to keep the python client on PIP up to date with the master branch, and not the develop branch.

marcpicaud commented 4 years ago

My 2 cents : the parsr service I'm using (via docker) was stuck in an infinite loop when using the parsr API (POST /document).

Fixed when dowgrading to v0.12.

royjohal commented 4 years ago

My 2 cents : the parsr service I'm using (via docker) was stuck in an infinite loop when using the parsr API (POST /document).

Fixed when dowgrading to v0.12.

Thanks @marcpicaud. Could you open an issue with more details?

jvalls-axa commented 4 years ago

My 2 cents : the parsr service I'm using (via docker) was stuck in an infinite loop when using the parsr API (POST /document).

Fixed when dowgrading to v0.12.

Hi @marcpicaud

Could you please try with 0.12.1 ??

I checked it and seems that everything works as expected...

jfilter commented 4 years ago

With Docker image v0.12.2 and client v3.2.2, I get stuck here:

[2020-06-19T17:09:34] INFO  (parsr-api/6 on c122c2d7f93b): Running module: ReadingOrderDetectionModule, Options: {"minVerticalGapWidth":20,"minColumnWidthInPagePercent":15}

Looks like an infinite loop to me.

It works fine with Docker image v0.12 and client v3.1.

MrAlecJohnson commented 3 years ago

I think the fix that's been applied solves the NameError when silent=False, but I don't think it solves the infinite loop when silent=True.

It looks like the problem is the indenting at 140-148:

https://github.com/axa-group/Parsr/blob/f4410d79154ee184fe4e4ed8c556ddb5fbecfa92/clients/python-client/parsr_client/parsr_client.py#L140-L148

The update to server_status_response is part of the if not silent block, so if silent=True the status is never updated.

As a side effect of the indenting, if you do set silent=False, "Job done!" gets printed on every iteration, even if the job isn't done.

I'd be happy to open a pull request, if tweaking this sounds like the right solution?

Aofei-Chang commented 1 year ago

I finally solved the "infinite loop" problem after sending document by degrading parsr-client to 3.1.0