allenai / vila

Incorporating VIsual LAyout Structures for Scientific Text Classification
Apache License 2.0
173 stars 16 forks source link

Better predict function #19

Closed lolipopshock closed 2 years ago

lolipopshock commented 2 years ago

This PR introduces a new function in the VILA predictors predict_page that allows setting the maximum batch size for running the model. This can be used to control the memory usage when using the vial models.

lolipopshock commented 2 years ago

The current batching function is tested via:

import vila 
import layoutparser as lp # For visualization 

from vila.pdftools.pdf_extractor import PDFExtractor
from vila.predictors import HierarchicalPDFPredictor, LayoutIndicatorPDFPredictor

pdf_extractor = PDFExtractor("pdfplumber")
page_tokens, page_images = pdf_extractor.load_tokens_and_image("test.pdf")

vision_model = lp.EfficientDetLayoutModel("lp://PubLayNet") 
pdf_predictor = LayoutIndicatorPDFPredictor.from_pretrained("allenai/ivila-block-layoutlm-finetuned-docbank")
for idx, page_token in enumerate(page_tokens):
    blocks = vision_model.detect(page_images[idx])
    page_token.annotate(blocks=blocks)
    pdf_data = page_token.to_pagedata().to_dict()
    predicted_tokens = pdf_predictor.predict(pdf_data, page_token.page_size)
    predicted_tokens2 = pdf_predictor.predict_page(pdf_data, page_token.page_size, 1)
    assert predicted_tokens == predicted_tokens2