Open KitWallace opened 6 years ago
such documents are usually scans of letters which have been posted as pdfs and thus could look like anything at all
three kinds of submissions seem possible - now all are parsed but a bitof a hack . Should also detect full pdf submissions and not attempt to parse. test-parse-pdf.xq script will test a saved pdf (in test)
example of a failed parse saved for analysis