balta2ar / manuscript-dl

Collection of scripts to download digitized manuscripts from various online libraries
23 stars 4 forks source link

Some clarifications on the bl.uk.py script #7

Open Mo2000 opened 1 year ago

Mo2000 commented 1 year ago

I found some issues that made me stumped for a bit, but eventually figured out. Figured I'd share them here in case someone else runs into them or I run into them in the future after I forget.

  1. If a manuscript does not exist in a certain resolution (e.g. no resolution 14), the output error would be something similar to the below:
python3 bl.uk.py or_12988 --resolution 14 --user-agent 'Mozilla/5.0 (X11; OpenBSD i386) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36' --pages 206:206
Downloading manuscript or_12988 resolution 14
335 pages found
1 pages downloading (range 206:206)
Downloading page or_12988_f103r (1/1)

End of the page
Page or_12988_f103r has size row x column = 0 x 0
Concatenating page or_12988_f103r (1/1)
montage-im6.q16: missing an image filename `pics/14/or_12988/or_12988_f103r/row_0.jpg' @ error/montage.c/MontageImageCommand/1804.
.montage-im6.q16: missing an image filename `pics/14/or_12988/or_12988_f103r.jpg' @ error/montage.c/MontageImageCommand/1804.

Converting manuscript or_12988 into PDF
Converting page or_12988_f103r (1/1)
convert-im6.q16: unable to open image `pics/14/or_12988/or_12988_f103r.jpg': No such file or directory @ error/blob.c/OpenBlob/2924.
convert-im6.q16: no images defined `pics/14/or_12988/or_12988_f103r.pdf' @ error/convert.c/ConvertImageCommand/3229.
Folding page or_12988_f103r.pdf (1/1)
Traceback (most recent call last):
  File "/home/mo/manuscript-dl/bl.uk.py", line 423, in <module>
    sys.exit(main(args))
  File "/home/mo/manuscript-dl/bl.uk.py", line 401, in main
    download_manuscript(args.pages,
  File "/home/mo/manuscript-dl/bl.uk.py", line 396, in download_manuscript
    convert_manuscript(resolution, base_dir, manuscript, pages)
  File "/home/mo/manuscript-dl/bl.uk.py", line 368, in convert_manuscript
    fold_pages(base_dir, manuscript, pages, output_name)
  File "/home/mo/manuscript-dl/bl.uk.py", line 345, in fold_pages
    shutil.copy2(pdf_name, output_name)
  File "/usr/lib/python3.10/shutil.py", line 434, in copy2
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/usr/lib/python3.10/shutil.py", line 254, in copyfile
    with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: 'pics/14/or_12988/or_12988_f103r.pdf'
  1. The --pages arg requires integers. The page numbers on the manuscript page are sub-labeled with r and v. For example, f103r and f103v. If you only want to download page f103r it would be 206 (103*2). For f103v it is 207 (103*2 + 1)