archivesunleashed / auk-notebooks

Jupyter notebooks to assist in creating additional analysis and visualizations of Archives Unleashed Cloud derivatives.
https://cloud.archivesunleashed.org
Other
11 stars 5 forks source link

RESULT_Limits should be capped at length of full text derivative #58

Open ianmilligan1 opened 5 years ago

ianmilligan1 commented 5 years ago

Right now, the default value of RESULT_LIMITS in the full text notebook is 2500.

If you try to use a full-text file with fewer than 2500 records (i.e. less than 2500 lines), it will fail with an opaque error:

---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-19-747248251276> in <module>()
      1 # Get the set of available years in the collection.
      2 
----> 3 year_range = set([x[0] for x in nb.get_text(TEXT_METHOD)])
      4 print(year_range)

/Users/ianmilligan1/anaconda/lib/python3.5/site-packages/au_notebook.py in get_text(self, by)
    176                             text.append(split_line[3])
    177                 else:
--> 178                     next(fin)
    179         return text
    180 

StopIteration:

It should be the lesser of RESULT_LIMITS or the length of the full text file.