clarinsi / clarin-dspace

LINDAT/CLARIN digital repository based on DSpace
http://lindat.cz
BSD 3-Clause "New" or "Revised" License
0 stars 2 forks source link

Link checker times out #8

Closed cyplas closed 7 years ago

cyplas commented 7 years ago

For some items, the fastchecklinks curation task times out:

2017-06-21 21:14:55,654 INFO  org.dspace.curate.Curator @ Curation task: fastchecklinks performed on: 11356/1125 with status: 1. Result: 'Item: 11356/1125  [https://www.clarin.\
si/repository/xmlui/admin/item?itemID=1665] has 3 urls to check...
 - http://hdl.handle.net/11356/1125 = -2 - TIMEOUT

Curiously, when I increased the lr.link.checker.connect.timeout and lr.link.checker.read.timeout parameters in local.conf, the curation task yielded OK, but still generated an ERROR in the logs and an email notification.

But in fact it would probably be better to get at the root of the problem: why do some items take longer to load and what could be done about it?

(BTW, the fastchecklinks curation task also yields "403 - FAILED" for the licenses: this is https://github.com/ufal/clarin-dspace/issues/678.)

kosarko commented 7 years ago

One reason why some items takes longer are the archive (zip in case of 11356/1125) previews. The files in the archive are essentially added to the item metadata; currently these are "baked" into the page, so for large (having many files) archives there's a lot of stuff to be transfered. It's something like 8.5MB in case of 11356/1125 filed ufal/clarin-dspace#785 for that

cyplas commented 7 years ago

Ok, thanks. In the meantime, I think we should just be aware of what the timeout means and sometimes try the timed out URL manually. So I'm closing this (@TomazErjavec: reopen if you disagree).

Curiously, when I increased the lr.link.checker.connect.timeout and lr.link.checker.read.timeout parameters in local.conf, the curation task yielded OK, but still generated an ERROR in the logs and an email notification.

Actually, this happens only for some items, and happens even if I keep the default parameter values (see ufal#792).