cardiacsociety / web-services

MappCPD API and other services
0 stars 1 forks source link

Problems running scheduled workers #71

Closed mikedonnici closed 5 years ago

mikedonnici commented 5 years ago

There seem to be a few issues when the scheduled workers are run, output below:

~ $ bash run_services.sh 
Running pubmedr #####################################################################
setBatch()...
Fetching batch file at url:  https://s3-ap-southeast-2.amazonaws.com/csanz-mapp-00/public/pubmedr/pubmed.json
Test API connection... https://mappcpd-csanz-web-services.herokuapp.com
200 OK
Authenticate and get token... Failed to authenticate witht the API
Running fixr #########################################################
Backdays not specified with -b flag, defaulting to 1
Running task: pubmedData
Looking for best date...
Try pubdate: 2019 Aug
Best publish date: 2019-8-1
Looking for best date...
Try pubdate: 2019 Jul 17
Best publish date: 2019-7-17
Looking for best date...
Try pubdate: 2019 May 27
Best publish date: 2019-5-27
panic: interface conversion: interface {} is nil, not map[string]interface {}

goroutine 1 [running]:
main.(*resourceData).pubmedData(0xc000221c10, 0xc00049ea48, 0x8)
    /tmp/build_fcfcb14ab90c677bb20487b31309494d/cmd/fixr/main.go:547 +0x8f8
main.updatePubmedData()
    /tmp/build_fcfcb14ab90c677bb20487b31309494d/cmd/fixr/main.go:470 +0x1a2
main.main()
    /tmp/build_fcfcb14ab90c677bb20487b31309494d/cmd/fixr/main.go:127 +0x273
Running syncr #####################################################################
2019/07/26 21:59:17 Running syncr with backdays: 1 on collection: all
2019/07/26 21:59:18 Sync'd 17 member
2019/07/26 21:59:18 Sync'd 1 modules
2019/07/26 21:59:19 Sync'd 21 resources
Running fixr #########################################################
Backdays not specified with -b flag, defaulting to 1
Running task: pubmedData
Looking for best date...
Try pubdate: 2019 Aug
Best publish date: 2019-8-1
Looking for best date...
Try pubdate: 2019 Jul 17
Best publish date: 2019-7-17
Looking for best date...
Try pubdate: 2019 May 27
Best publish date: 2019-5-27
panic: interface conversion: interface {} is nil, not map[string]interface {}

goroutine 1 [running]:
main.(*resourceData).pubmedData(0xc0002b3c10, 0xc0004dcab8, 0x8)
    /tmp/build_fcfcb14ab90c677bb20487b31309494d/cmd/fixr/main.go:547 +0x8f8
main.updatePubmedData()
    /tmp/build_fcfcb14ab90c677bb20487b31309494d/cmd/fixr/main.go:470 +0x1a2
main.main()
    /tmp/build_fcfcb14ab90c677bb20487b31309494d/cmd/fixr/main.go:127 +0x273
Running algr #####################################################################
2019/07/26 21:59:19 Updating index: DIRECTORY, sched: daily, type: full
Algolia batch TaskID 12216488112 - count 1000
Algolia batch TaskID 12216488122 - count 1000
Algolia batch TaskID 12216488132 - count 1000
Algolia batch TaskID 12216488142 - count 299
2019/07/26 21:59:23 Updating index: MEMBERS, sched: daily, type: full
Algolia batch TaskID 12216488162 - count 1000
Algolia batch TaskID 12216488172 - count 1000
Algolia batch TaskID 12216488182 - count 1000
Algolia batch TaskID 12216488192 - count 1000
Algolia batch TaskID 12216488202 - count 1000
Algolia batch TaskID 12216488212 - count 17
2019/07/26 21:59:26 Updating index: MODULES, sched: daily, type: atomic
Algolia batch TaskID 12216488232 - count 1000
Algolia batch TaskID 12216488242 - count 424
2019/07/26 21:59:27 Updating index: RESOURCES, sched: daily, type: partial
Algolia batch TaskID 12216488272 - count 21
2019/07/26 21:59:27 Updating index: QUALIFICATIONS, sched: daily, type: atomic
Algolia batch TaskID 12216488292 - count 178
2019/07/26 21:59:28 Updating index: ORGANISATIONS, sched: daily, type: atomic
Algolia batch TaskID 12216488372 - count 371
Running backupdb ##################################################################
Fetch latest database snapshot...
Downloaded snapshot to 1564178370.sql.gz
Copying 1564178370.sql.gz to Dropbox...
Cleaning up...
Done!
All done!
mikedonnici commented 5 years ago

Issue is with fixr:

Try pubdate: 2019 May 27
Best publish date: 2019-5-27
panic: interface conversion: interface {} is nil, not map[string]interface {}

goroutine 1 [running]:
main.(*resourceData).pubmedData(0xc000221c10, 0xc00049ea48, 0x8)
    /tmp/build_fcfcb14ab90c677bb20487b31309494d/cmd/fixr/main.go:547 +0x8f8
main.updatePubmedData()
    /tmp/build_fcfcb14ab90c677bb20487b31309494d/cmd/fixr/main.go:470 +0x1a2
main.main()
    /tmp/build_fcfcb14ab90c677bb20487b31309494d/cmd/fixr/main.go:127 +0x273
mikedonnici commented 5 years ago

This runs OK:

fixr -b 1 -t "fixResources"

This throws the error:

fixr -b 1 -t "pubmedData"

.. so issue is with date conversion for this task.

mikedonnici commented 5 years ago

Found the problem!

Is a rate limit for the calls to pubmedData(articleD) , response:

{
  "error": "API rate limit exceeded",
  "api-key": "2001:8000:10a4:6b00:adb0:f0d:a280:d715",
  "count": "4",
  "limit": "3"
}

This is a new thing: https://www.ncbi.nlm.nih.gov/books/NBK25497/

On December 1, 2018, NCBI will begin enforcing the use of API keys that will offer enhanced levels of supported access to the E-utilities. After that date, any site (IP address) posting more than 3 requests per second to the E-utilities without an API key will receive an error message. By including an API key, a site can post up to 10 requests per second by default. Higher rates are available by request (vog.hin.mln.ibcn@seitilitue). Users can obtain an API key now from the Settings page of their NCBI account (to create an account, visit http://www.ncbi.nlm.nih.gov/account/). After creating the key, users should include it in each E-utility request by assigning it to the new api_key parameter.

Created an API Key and added a hard-coded 100ms delay between requests to ensure no more than 10 requests per second.