I started to play around with this tool on pathology books, and I noticed that for figure heavy books (little to no paragraph text, just figures/tables) that the algorithm currently cannot scrape the figures with captions.
Here is an example snapshot of a book I was testing (Differential Diagnosis in Surgical Pathology: Breast, Jean F. Simpson MD, Melinda E. Sanders MD):
As you can see, the chapters start with a table and then proceed with just large figures with captions.
I am getting the "Unable to disambiguate caption candidates..." error for all the figures in this book.
I was wondering if you could give some tips on how to enhance/troubleshoot the code to work with books like this? I would really like to use this tool to scrape image-caption pairs from books like this if possible.
Hello,
I started to play around with this tool on pathology books, and I noticed that for figure heavy books (little to no paragraph text, just figures/tables) that the algorithm currently cannot scrape the figures with captions.
Here is an example snapshot of a book I was testing (Differential Diagnosis in Surgical Pathology: Breast, Jean F. Simpson MD, Melinda E. Sanders MD):
As you can see, the chapters start with a table and then proceed with just large figures with captions.
I am getting the "Unable to disambiguate caption candidates..." error for all the figures in this book.
I was wondering if you could give some tips on how to enhance/troubleshoot the code to work with books like this? I would really like to use this tool to scrape image-caption pairs from books like this if possible.
Thanks, Jack