AlexsLemonade / refinebio

Refine.bio harmonizes petabytes of publicly available biological data into ready-to-use datasets for cancer researchers and AI/ML scientists.
https://www.refine.bio/
Other
128 stars 19 forks source link

Mock 3rd party API responses in the tests #1878

Closed arielsvn closed 4 years ago

arielsvn commented 4 years ago

Context

Our tests depend on several 3rd party APIs, sometimes they fail when those services are down.

Problem or idea

Running the tests multiple times should give the same result. Sometimes I see failed tests and I'm not sure if it's because there's something wrong with our code, or because some service is down.

2019-11-08 18:27:57,805 local [volume: 0] data_refinery_foreman.surveyor.external_source INFO [downloaded_urls: ['second_url', 'third_url', 'fourth_url']] [survey_job: 26] [downloader_job: 10308]: Queuing downloader job.
....2019-11-08 18:30:48,214 local [volume: 0] data_refinery_foreman.surveyor.harmony ERROR [response_code: 404]: Unable to fetch URL: https://www.ebi.ac.uk/arrayexpress/files/E-GEOD-7753/E-GEOD-7753.sdrf.txt
...Too long with no output (exceeded 10m0s)
======================================================================
FAIL: test_geo_celgz_redownloading (data_refinery_foreman.surveyor.test_end_to_end.GeoCelgzRedownloadingTestCase)
Survey, download, then process an experiment we know is Affymetrix.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user/data_refinery_foreman/surveyor/test_end_to_end.py", line 425, in test_geo_celgz_redownloading
    self.assertTrue(recreated_job.success)
AssertionError: None is not true
2019-11-08 19:36:45,327 local [volume: 0] data_refinery_foreman.surveyor.external_source ERROR [survey_job: 1]: Exception caught while discovering samples. Terminating survey job.
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 731, in urlopen
    body_pos=body_pos, **response_kw)
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 731, in urlopen
    body_pos=body_pos, **response_kw)
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 731, in urlopen
    body_pos=body_pos, **response_kw)
  File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 711, in urlopen
    retries = retries.increment(method, url, response=response, _pool=self)
  File "/usr/local/lib/python3.5/dist-packages/urllib3/util/retry.py", line 398, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.ebi.ac.uk', port=443): Max retries exceeded with url: /ena/data/view/SRR1603661&display=xml (Caused by ResponseError('too many 500 error responses',))

[...]

======================================================================
FAIL: test_unmated_reads (data_refinery_foreman.surveyor.test_end_to_end.EnaFallbackTestCase)
Survey, download, then process a sample we know is SRA and has unmated reads.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/user/data_refinery_foreman/surveyor/test_end_to_end.py", line 703, in test_unmated_reads
    self.assertTrue(survey_job.success)
AssertionError: False is not true

Solution or next step

Filing this out of frustration... is there any other way that we can achieve the same we have now and make the tests more reliable?

Maybe we can have an external status page to check these services and have our tests use mocked responses?

@davidsmejia @kurtwheeler what do you think?

kurtwheeler commented 4 years ago

Goal this sprint: Get at least 80% of the way towards using this across our test suite.

kurtwheeler commented 4 years ago

OK so I made a lot of progress on this last sprint. So far I've:

I tested using VCR in the workers. It works well. I've also made a way to mock urlopen so we can mock out file downloads for downloader tests.

kurtwheeler commented 4 years ago

This issue is now for everything but tests in the workers sub-project. https://github.com/AlexsLemonade/refinebio/issues/2085 will be for the workers sub-project.