Hegghammer / daiR

R package for Google Document AI
https://dair.info/
Other
41 stars 4 forks source link

Processor ID problem #2

Closed whanley closed 3 years ago

whanley commented 3 years ago

Great docs and very excited to start using this package.

I'm having a problem with setup.

I get this error when I run dai_sync

> response1 <- dai_sync("test-jtm-1.pdf")
File submitted at 2021-06-01 15:02:26. HTTP status: 404 - unsuccessful.
Error: "Processor with id '4***********4' not found."

I set the processor id variable using DAI_PROCESSOR_ID= in .Renviron.

I am able to process the document in the application console.

I'm sure I'm not giving everything I need to identify the problem...sorry.

Hegghammer commented 3 years ago

Thanks for your interest in daiR! The error message is coming from Google, so this is an authentication issue. Possibilities that come to mind:

I doubt it's to do with the .Renviron setup, but you can double-check by passing the processor id straight into the dai_sync call: response1 <- dai_sync("test-jtm-1.pdf", proc_id = "<your raw processor id>")

whanley commented 3 years ago

Thanks for the rapid reply. None of those three possibilities is the problem. When I pass the processor id straight into the call I have the same problem.

I deleted my project and made a new one, following all the vignette steps, but the "processor not found" issue persists.

I was able to upload documents to the bucket for asynchronous processing, but encountered the same issue:

> gcs_list_objects()
            name     size             updated
1 test-jtm-1.pdf 906.3 Kb 2021-06-03 22:22:41
> response2 <- dai_async("test-jtm-1.pdf")
1 files submitted at 2021-06-03 22:22:47. HTTP status: 404 - unsuccessful.
Error: "Processor with id 'e**************d' not found."
Hegghammer commented 3 years ago

Hm. The only other possibility I can think of is that the processor's location differs from that supplied by dai_sync/dai_async by default ("eu"). Relatedly, I recommend looking more closely at the parameters taken by dai_sync/dai_async (https://dair.info/reference/index.html) to see if you need a non-default setting on some of them.

Meanwhile, you can use dai_sync_tab and dai_async_tab to process documents, as they don't require a personal processor. They connect to a slightly different API endpoint that happens to recognize tables (v1beta2), but they also provide regular OCR of text, just like dai_sync/dai_async.

whanley commented 3 years ago

Solved! I created an EU processor and everything works fine now. I was using a US processor. Many thanks for the help.

Hegghammer commented 3 years ago

Glad to hear!