allenai / vila

Incorporating VIsual LAyout Structures for Scientific Text Classification
Apache License 2.0
167 stars 17 forks source link

Can we train on Companies PDF documents #35

Open skprasadu opened 1 year ago

skprasadu commented 1 year ago

Hello Shannon,

I an consulting with companies and they have PDF corpus, and currently we are using Cloud tools that extract these PDF, the results are ok, but very expensive.

Do you think we can collaborate on this and build Layout aware PDF, your tool seems to be promising, can we train on these PDFs.

Let me know what you think. I am really interested in collaborating on this.

Krishna

lolipopshock commented 1 year ago

Yes, it should be straightforward to do so, as long as you have sufficient labeled data.