dbashford / textract

node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
MIT License
1.64k stars 186 forks source link

What is suggested architecture to make this into an API? #200

Open johnernest02 opened 4 years ago

johnernest02 commented 4 years ago

I am an Android developer by trade. I'd like to use this library you've made which is the solution central to the business logic of our app thank you very much for this!

But being an android developer, I have little knowledge on how I can make REST APIs, also considering the other software needed for the library (pdftotext, antiword, etc). I have seen AWS and Heroku, but confused as to which OS the server should run on and the specifics. Could you point me in the right direction? I am studying NodeJS for Rest APIs

pebojote commented 4 years ago

Let me know if this is close to your goal: textract-sample

pebojote commented 4 years ago

Anyway, that's my pitch deck for hackathon competition 😅

ari62 commented 4 years ago

this is a node project, so wrapping it with something like express (https://expressjs.com/) would be the first step. Then you have to host it. Google hosting express js and you should get a tutorial on how to set it up on a server (for example https://developer.mozilla.org/en-US/docs/Learn/Server-side/Express_Nodejs/deployment). If you choose a service like heroku you wont have to worry about setting up the server and your linux knowledge too much. Otherwise theres a lot of options, like AWS EC2. Always choose a linux os. Then point your url's dns to the server.