facebookresearch / nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents
https://facebookresearch.github.io/nougat/
MIT License
8.98k stars 567 forks source link

for idx, sample in tqdm(enumerate(dataloader), total=len(dataloader)): for prediction #51

Closed shayal01 closed 1 year ago

shayal01 commented 1 year ago

if we are just using a single pdf, the sample is a list and inference is expecting a tensor of an image ,so the below code will not work so we should make it into sample[0].where sample[0] is the tensor which is stored in the 0th index of the list model_output = model.inference(image_tensors=sample)

this a function where i passed a single pdf file. and made predictions for each page def predict(): model=NougatModel.from_pretrained("C:/Users/sshamsu/Documents/New folder/nougat weights").to(torch.bfloat16)#getting nougat pretrained model if torch.cuda.is_available(): model.to("cuda")

dataset=LazyDataset("C:/Users/sshamsu/Downloads/research paper for Nought.pdf",  #it should be the file path of the pdf 
        partial(model.encoder.prepare_input,random_padding=False),
    )#object of the class LazyDataset 
dataloader = torch.utils.data.DataLoader(
        dataset,
        batch_size=1,
        shuffle=False,
        collate_fn=LazyDataset.ignore_none_collate,

    )
prediction=[]
for page_num,page_as_tensor in tqdm(enumerate(dataloader)):
    model_output = model.inference(image_tensors=page_as_tensor[0])
    output = markdown_compatible(model_output["predictions"][0])
    prediction.append(output)

final_mmd="".join(prediction).strip()

return final_mmd
lukas-blecher commented 1 year ago

What is the issue exactly? Your code only works for batch size = 1

shayal01 commented 1 year ago

for page_num,page_as_tensor in tqdm(enumerate(dataloader)): model_output = model.inference(image_tensors=page_as_tensor[0])

If i don't mention the index 0 in page_as_tensor ,an error pops because page_as_tensor is a list.May be because i am doing it for just one paper .but in the predict.py and app.py files ,they didn't mention the index.So is it issue too when using multiple pdfs?