AtlasOfLivingAustralia / biocache-store

Occurrence processing, indexing and batch processing
Other
7 stars 24 forks source link

java.io.IOException in loading images #342

Closed sadeghim closed 4 years ago

sadeghim commented 4 years ago

Loading sightings dr364 resulted in :

aws-bstore-4b 2019-09-20 00:10:24,599 ERROR: [DataLoader] - Server returned HTTP response code: 503 for URL: https://images.ala.org.au/ws/image/f4ab8ea1-d544-4432-9362-63e3d2d903e3.json
java.io.IOException: Server returned HTTP response code: 503 for URL: https://images.ala.org.au/ws/image/f4ab8ea1-d544-4432-9362-63e3d2d903e3.json
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1900)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498)
    at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:268)
    at java.net.URL.openStream(URL.java:1057)
    at scala.io.Source$.fromURL(Source.scala:140)
    at au.org.ala.biocache.util.HttpUtil$.get(HttpUtil.scala:20)
    at au.org.ala.biocache.load.RemoteMediaStore$.getMetadata(MediaStore.scala:350)
    at au.org.ala.biocache.load.RemoteMediaStore$.save(MediaStore.scala:328)
    at au.org.ala.biocache.load.DataLoader$$anonfun$processMedia$1.apply(DataLoader.scala:323)
    at au.org.ala.biocache.load.DataLoader$$anonfun$processMedia$1.apply(DataLoader.scala:295)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at au.org.ala.biocache.load.DataLoader$class.processMedia(DataLoader.scala:295)
    at au.org.ala.biocache.load.DwcCSVLoader.processMedia(DwcCSVLoader.scala:85)
    at au.org.ala.biocache.load.DataLoader$class.load(DataLoader.scala:256)
    at au.org.ala.biocache.load.DwcCSVLoader.load(DwcCSVLoader.scala:85)
    at au.org.ala.biocache.load.DwcCSVLoader.loadFile(DwcCSVLoader.scala:279)
    at au.org.ala.biocache.load.DwcCSVLoader$$anonfun$loadDirectory$1.apply(DwcCSVLoader.scala:138)
    at au.org.ala.biocache.load.DwcCSVLoader$$anonfun$loadDirectory$1.apply(DwcCSVLoader.scala:136)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
    at au.org.ala.biocache.load.DwcCSVLoader.loadDirectory(DwcCSVLoader.scala:136)
    at au.org.ala.biocache.load.DwcCSVLoader$$anonfun$load$1.apply(DwcCSVLoader.scala:118)
    at au.org.ala.biocache.load.DwcCSVLoader$$anonfun$load$1.apply(DwcCSVLoader.scala:110)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at au.org.ala.biocache.load.DwcCSVLoader.load(DwcCSVLoader.scala:110)
    at au.org.ala.biocache.load.Loader.load(Loader.scala:206)
    at au.org.ala.biocache.load.Loader$$anonfun$main$4.apply(Loader.scala:99)
    at au.org.ala.biocache.load.Loader$$anonfun$main$4.apply(Loader.scala:99)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
    at au.org.ala.biocache.load.Loader$.main(Loader.scala:98)
    at au.org.ala.biocache.cmd.CMD2$.main(CMD2.scala:130)
    at au.org.ala.biocache.cmd.CMD2.main(CMD2.scala)

This happens when the image-service is flooded with lots of requests and cannot handle the load or rejected by nginx throttle mechanism. Lifting the throttling threshold didn't solve the issue. Need try catch retry mechanism (with a pause) or some sort of throttling on biocache-store side.

sadeghim commented 4 years ago

It has happened for questaGame as well.

aws-bstore-4b 2019-09-20 02:13:06,600 INFO : [DataLoader] - Loading: https://api.questagame.com/images/uploads/sighting/286586/1.jpeg|https://api.questagame.com/images/uploads/sighting/286586/2.jpeg|https://api.questagame.com/images/uploads/sighting/286586/3.jpeg
aws-bstore-4b 2019-09-20 02:13:06,646 INFO : [DataLoader] - Loading: https://api.questagame.com/images/uploads/sighting/286587/1.jpeg|https://api.questagame.com/images/uploads/sighting/286587/2.jpeg|https://api.questagame.com/images/uploads/sighting/286587/3.jpeg|https://api.questagame.com/images/uploads/sighting/286587/4.jpeg
aws-bstore-4b 2019-09-20 02:13:06,673 ERROR: [DataLoader] - Server returned HTTP response code: 503 for URL: https://images.ala.org.au/ws/image/53241e8c-2dc5-43e2-bd2f-9bc7392cc74a.json
java.io.IOException: Server returned HTTP response code: 503 for URL: https://images.ala.org.au/ws/image/53241e8c-2dc5-43e2-bd2f-9bc7392cc74a.json
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1900)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498)
    at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:268)
    at java.net.URL.openStream(URL.java:1057)
    at scala.io.Source$.fromURL(Source.scala:140)
    at au.org.ala.biocache.util.HttpUtil$.get(HttpUtil.scala:20)
    at au.org.ala.biocache.load.RemoteMediaStore$.getMetadata(MediaStore.scala:350)
    at au.org.ala.biocache.load.DataLoader$$anonfun$processMedia$1.apply(DataLoader.scala:326)
    at au.org.ala.biocache.load.DataLoader$$anonfun$processMedia$1.apply(DataLoader.scala:295)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at au.org.ala.biocache.load.DataLoader$class.processMedia(DataLoader.scala:295)
    at au.org.ala.biocache.load.DwcCSVLoader.processMedia(DwcCSVLoader.scala:85)
    at au.org.ala.biocache.load.DataLoader$class.load(DataLoader.scala:256)
    at au.org.ala.biocache.load.DwcCSVLoader.load(DwcCSVLoader.scala:85)
    at au.org.ala.biocache.load.DwcCSVLoader.loadFile(DwcCSVLoader.scala:279)
    at au.org.ala.biocache.load.DwcCSVLoader$$anonfun$loadDirectory$1.apply(DwcCSVLoader.scala:138)
    at au.org.ala.biocache.load.DwcCSVLoader$$anonfun$loadDirectory$1.apply(DwcCSVLoader.scala:136)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
    at au.org.ala.biocache.load.DwcCSVLoader.loadDirectory(DwcCSVLoader.scala:136)
    at au.org.ala.biocache.load.DwcCSVLoader$$anonfun$load$1.apply(DwcCSVLoader.scala:118)
    at au.org.ala.biocache.load.DwcCSVLoader$$anonfun$load$1.apply(DwcCSVLoader.scala:110)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at au.org.ala.biocache.load.DwcCSVLoader.load(DwcCSVLoader.scala:110)
    at au.org.ala.biocache.load.Loader.load(Loader.scala:206)
    at au.org.ala.biocache.load.Loader$$anonfun$main$4.apply(Loader.scala:99)
    at au.org.ala.biocache.load.Loader$$anonfun$main$4.apply(Loader.scala:99)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
    at au.org.ala.biocache.load.Loader$.main(Loader.scala:98)
    at au.org.ala.biocache.cmd.CMD2$.main(CMD2.scala:130)
djtfmartin commented 4 years ago

Need a little more information I think to action this. Cant reproduce in Jenkins for either resource mentioned, so i presume different parameters where used, or perhaps something else was going on at the time (multiple loads ?).

ansell commented 4 years ago

HTTP 503 was being hit too often, so because we couldn't get in contact with you, we temporarily removed nginx rate limiting from aws-image-service. However, this is now causing aws-image-service to become unreliable during data loads when the images are already present and it spikes to 400+ requests per second.

We don't want biocache-store to become an issue that stops people accessing images.ala.org.au but the only way we could achieve the iNaturalist image loads which are a critical priority and not drop images from our nightly loads (also a critical priority) was to remove the nginx rate limiter.

djtfmartin commented 4 years ago

Thanks @ansell

Currently seeing this when i try to load iNat


Started by upstream project "Cloud Infrastructure/Parameterised Biocache Command" build number 15583
originally caused by:
 Started by user david.martin@csiro.au
Running as SYSTEM
Building remotely on aws-bstore-4b.ala (aws-bstore) in workspace /home/jenkins/workspace/Cloud Infrastructure/BSTORE/Parameterised Load
[Parameterised Load] $ /bin/bash /tmp/jenkins3203244736380632645.sh
+ export 'BIOCACHE_OPTS= -Dbiocache.config=/data/biocache/config/biocache-config.properties'
+ BIOCACHE_OPTS=' -Dbiocache.config=/data/biocache/config/biocache-config.properties'
+ rnf=
+ '[' false = true ']'
+ biocache load --number-of-threads 1 dr1411
aws-bstore-4b 2019-10-01 19:44:59,367 INFO : [ConfigModule] - Using config file: /data/biocache/config/biocache-config.properties
aws-bstore-4b 2019-10-01 19:45:00,405 INFO : [Config] - Using the default set of blacklisted media URLs
aws-bstore-4b 2019-10-01 19:45:00,561 INFO : [SolrIndexDAO] - Initialising the solr server aws-zoo-a1.ala:2181,aws-zoo-b1.ala:2181,aws-zoo-b2.ala:2181,aws-zoo-c1.ala:2181,aws-zoo-c2.ala:2181 cloudserver:null solrServer:null
aws-bstore-4b 2019-10-01 19:45:00,924 INFO : [Loader] - Starting to load resource: dr1411
aws-bstore-4b 2019-10-01 19:45:01,305 INFO : [DataLoader] - Darwin core archive loading
aws-bstore-4b 2019-10-01 19:45:01,380 INFO : [DataLoader] - SFTP the most recent from sftp://upload.ala.org.au:ala/dr1411
aws-bstore-4b 2019-10-01 19:45:02,046 ERROR: [SFTPTools] - No latest file for sftp://upload.ala.org.au:ala/dr1411
aws-bstore-4b 2019-10-01 19:45:02,049 ERROR: [DataLoader] - Unable to extract a new file for dr1411 at sftp://upload.ala.org.au:ala/dr1411
aws-bstore-4b 2019-10-01 19:45:02,049 INFO : [DataLoader] - File last modified date: null
aws-bstore-4b 2019-10-01 19:45:02,448 INFO : [DataLoader] - Registry response code: 200
aws-bstore-4b 2019-10-01 19:45:02,565 INFO : [DataLoader] - Registry response code: 200
aws-bstore-4b 2019-10-01 19:45:02,565 INFO : [Loader] - Completed loading resource: dr1411. Completed in 1.641seconds (0.02735 minutes)
No mail will be sent out, as 'Cloud Infrastructure » BSTORE » Parameterised Load #5354' does not have a result yet. Please make sure you set a proper result in case of pipeline/build scripts.
Finished: SUCCESS
ansell commented 4 years ago

Mahmoud has enabled the daily iNaturalist API load, which is run using http://localhost:9193/job/iNaturalist%20Daily/

The result you are seeing is expected, as that particular delta load file has already been loaded.

I have reenabled the nginx rate limiting on aws-image-service to enable replication of the issue, at the risk of images not consistently loading in tonight's regular data loads.

djtfmartin commented 4 years ago

Ive added some rate limiting logic and while it seems to running ok, Im unsure if it is sufficient without reliable replication of the issues.

http://localhost:9193/job/Cloud%20Infrastructure/job/BSTORE/job/Parameterised%20Load/5359/console

Still assessing if it has enough impact. The config property is

media.store.maxrequests.persec=10

which defaults to 10 requests per second. Note the nginx configuration is currently set to

limit_req_zone $binary_remote_addr zone=nginxratelimitzone:10m rate=1000r/m;

which is approximately 16 request a second.

sadeghim commented 4 years ago

It's fixed since the snapshot version 2.4.6-SNAPSHOT is deployed.