abbyy / ocrsdk.com

ABBYY Cloud OCR SDK
http://ocrsdk.com/github
Apache License 2.0
504 stars 483 forks source link

`/listFinishedTasks` unclear how to use or what the benefit is #101

Open lionel-rowe opened 8 months ago

lionel-rowe commented 8 months ago

Per JavaScript/ocrsdk.js:

https://github.com/abbyy/ocrsdk.com/blob/3087b8020f42aa0b52ceed306d0ec2066a3028e9/JavaScript/ocrsdk.js#L125-L127

It looks like the only sample that actually implements this recommendation is Java/Abbyy.Ocrsdk.client/srcProcessManyFiles.java:

https://github.com/abbyy/ocrsdk.com/blob/3087b8020f42aa0b52ceed306d0ec2066a3028e9/Java/Abbyy.Ocrsdk.client/src/ProcessManyFiles.java#L245-L250

Given the limit of 100 results, sorted by date ascending (?!), and with no way of paginating, it's necessary to manually delete each completed task once it's been downloaded. This seems very flaky — if the program crashes before completion, leaving several tasks un-downloaded, those tasks will presumably just hang around taking up space on the list of returned tasks forever. Once the number of such zombie tasks reaches 100, calls to /listFinishedTasks will never return any relevant tasks, so the program will continue polling the endpoint forever until manually terminated.

Also, is there any real drawback to just polling /getTaskStatus for each ongoing task? It's not clear what the benefit of polling /listFinishedTasks is, given the increased complexity and flakiness. Presumably calls to /getTaskStatus are cheap, as they don't send or return much data or do any processing, just check a status.