facing challenges extracting methods sections from PDFs
existing methods = rvest or httr for web scraping, are not effective for extracting specific sections like methods
Options:
specialized tools for PDF "parsing". The pdftools or tabulizer packages -for example- parse PDF documents and extract text or tables, which we could maybe use for retrieving methods sections.
Anything we find will need a lot of refinement based on the variability in formatting across all PDFs. I'd appreciate any guidance or suggestions - really anything at all for extraction for methods sections. we can also do this manually without too much trouble, but it takes a bit of time for every PDF @PrajnaKARai, and I did some time-test runs on that.
Options:
Anything we find will need a lot of refinement based on the variability in formatting across all PDFs. I'd appreciate any guidance or suggestions - really anything at all for extraction for methods sections. we can also do this manually without too much trouble, but it takes a bit of time for every PDF @PrajnaKARai, and I did some time-test runs on that.