AlexsLemonade / refinebio

Refine.bio harmonizes petabytes of publicly available biological data into ready-to-use datasets for cancer researchers and AI/ML scientists.
https://www.refine.bio/
Other
129 stars 20 forks source link

Our tests break if GEO is down #2471

Open wvauclain opened 4 years ago

wvauclain commented 4 years ago

Context

I'm trying to get my tests to pass on GitHub Actions, and I think I fixed the last bug but now GEO is down.

Problem or idea

If GEO is down our tests fail, and they fail with a cryptic error message:

Traceback (most recent call last):
  File "/home/user/data_refinery_foreman/surveyor/external_source.py", line 168, in survey
    experiment, samples = self.discover_experiment_and_samples()
  File "/home/user/data_refinery_foreman/surveyor/geo.py", line 498, in discover_experiment_and_samples
    experiment, samples = self.create_experiment_and_samples_from_api(experiment_accession_code)
  File "/home/user/data_refinery_foreman/surveyor/geo.py", line 316, in create_experiment_and_samples_from_api
    self.set_platform_properties(sample_object, sample.metadata, gse)
  File "/home/user/data_refinery_foreman/surveyor/geo.py", line 78, in set_platform_properties
    external_accession, destdir=self.get_temp_path(), how="brief", silent=True
  File "/usr/local/lib/python3.5/dist-packages/GEOparse/GEOparse.py", line 86, in get_GEO
    return parse_GPL(filepath)
  File "/usr/local/lib/python3.5/dist-packages/GEOparse/GEOparse.py", line 411, in parse_GPL
    with utils.smart_open(filepath) as soft:
  File "/usr/lib/python3.5/contextlib.py", line 59, in __enter__
    return next(self.gen)
  File "/usr/local/lib/python3.5/dist-packages/GEOparse/utils.py", line 156, in smart_open
    fh = fopen(filepath, mode, errors="ignore")
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/2/GPL570.txt'

Solution or next step

If possible, we should find a way to mock out those network requests.

wvauclain commented 4 years ago

Okay, I poked around at this a bit, and it looks like it would be difficult for minimal gain. Since GEO uses FTP, we can't just use vcrpy to mock out the network calls. Instead, we would probably have to manually mock internal calls inside the GEOparse source code, which seems brittle.

kurtwheeler commented 4 years ago

I pretty much came to the same conclusion when doing https://github.com/AlexsLemonade/refinebio/pull/2080