How to get the respective output as a return variable rather than saving it as a mmd file?

bp-high commented 1 year ago

Which internal functions have to be called so that one can directly get the markdown content in a variable say in a script rather than using the cli method?

lukas-blecher commented 1 year ago

If you're planning to convert multiple files I'd recommend using the API. Launch it via nougat_api and call it from python

import requests

pdf= "path/to/file.pdf"
markdown= requests.post(
    "http://127.0.0.1:8503/predict", files={"file": open(pdf, "rb")}
).json()

If you don't want to use the API, follow predict.py.

Edit: Make sure to install nougat with api functionality pip install "nougat-ocr[api]"

Turingforce commented 9 months ago

If you're planning to convert multiple files I'd recommend using the API. Launch it via nougat_api and call it from python
import requests

pdf= "path/to/file.pdf"
markdown= requests.post(
    "http://127.0.0.1:8503/predict", files={"file": open(pdf, "rb")}
).json()
If you don't want to use the API, follow predict.py.

Edit: Make sure to install nougat with api functionality pip install "nougat-ocr[api]"

Actually, I found that, full post url may endwith /, or it will raise an error of 307 Temporary Redirect.

import requests

pdf= "path/to/file.pdf"
markdown= requests.post(
    "http://127.0.0.1:8503/predict/", files={"file": open(pdf, "rb")}
).json()

facebookresearch / nougat

How to get the respective output as a return variable rather than saving it as a mmd file? #98