Open Hegghammer opened 1 month ago
@Hegghammer, I am sorry that you experienced trouble. The reason is not directly related to a max. limit, but rather the volume of the databases behind which hold the data for "digavis" and "digibok" doctypes. We are looking into improving the backend, but for the moment, a strategy would be to segment several corpora by more limited time periods.
I tested to do this per decade. For example
corpus_70s = dh.Corpus(doctype="digavis", fulltext="miljøvern", from_year=1970, to_year=1979, limit=100000)
corpus_70s
returns a dataframe of 51303 rows, while
corpus_80s = dh.Corpus(doctype="digavis", fulltext="miljøvern", from_year=1980, to_year=1989, limit=100000)
corpus_80s
returns a dataframe of 35253 rows.
(Be aware that the year values in your code example are entered as strings, within quotation marks. Since Python is dynamic, it might work anyway but year values are defined as integers, and should be entered without quotation marks. Documentation of different parameters' data type can be found here: https://dhlab.readthedocs.io/en/stable/apidocs/dhlab/dhlab.text.corpus.html )
Thanks for the quick reply. Segmentation should work in principle, but for some strange reason I now get the error on all searches, including:
corpus_70s
)I'm confused. Are there rate limits or something else I should be aware of?
Update: today the API works mostly fine for me. I was even able to run the command in the opening post and get back a df with >200k rows in about 15 seconds. It's not clear to me what's going on. Either 1) you fixed it, 2) something was up with my system yesterday, or 3) the service is unstable.
My config for what it's worth:
When I search for something that (presumably) yields a lot of hits, I get a
JSONDecodeError
. If I narrow the chronological window, it works. What is the max number of items that the system can return?This, for example, yields a
JSONDecodeError
: