Closed maneesha closed 1 month ago
I think this is as expected - we have an excerpt of text in our data files that we use in exercise 2, and this is the citation for it:
An example of text captured by an optical character recognition process: General Report on the Physiography of Maryland. A dissertation, etc. (Reprinted from Report of Maryland State Weather Service.) [With maps and illustrations.] 1898 (from https://doi.org/10.21250/db12)
The DOI in the citation takes the user to the full collection where this work was pulled from.
OK, I see. In that case, maybe we can note that these links are for reference (not for use in the lesson), and explicitly remind users that excerpts for learning purposes are included with the download files for this lesson.
Also - As the source file is 44G I think most people would not want to download it. It may be useful and interesting to have the original accessible to view in some other way so learners can see the source file that was rendered to plan text using OCR.
A couple of thoughts on how we can improve this:
I'll make a PR for this change soon.
What is the problem?
In the Working with free text section, there is a DOI link to the OCR General Report on the Physiography of Maryland. However, the link actually seems to go to a page with a zip file of 14847 volumes.
Location of problem (optional)
No response
EDITED to fix the link to the lesson episode.