AlexsLemonade / refinebio

Refine.bio harmonizes petabytes of publicly available biological data into ready-to-use datasets for cancer researchers and AI/ML scientists.
https://www.refine.bio/
Other
129 stars 19 forks source link

Fix RNA-Seq end to end tests #3283

Closed arkid15r closed 1 year ago

arkid15r commented 1 year ago

Context

In order to complete restoring staging deploy process the CI/CD errors need to be fixed.

Problem or idea

Polling SurveyJobs. Currently waiting for job id: 129117
SurveyJob 129117 failed!
F
======================================================================
FAIL: test_all_the_things (tests.foreman.test_end_to_end.FullFlowEndToEndTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.8/unittest/case.py", line 60, in testPartExecutor
    yield
  File "/usr/lib/python3.8/unittest/case.py", line 676, in run
    self._callTestMethod(testMethod)
  File "/usr/lib/python3.8/unittest/case.py", line 633, in _callTestMethod
    method()
  File "/home/user/tests/foreman/test_end_to_end.py", line 235, in test_all_the_things
    self.process_experiments()
  File "/home/user/tests/foreman/test_end_to_end.py", line 279, in process_experiments
    self.assertTrue(wait_for_job(survey_job))
  File "/usr/lib/python3.8/unittest/case.py", line 765, in assertTrue
    raise self.failureException(msg)
AssertionError: False is not true

----------------------------------------------------------------------
Ran 2 tests in 3422.129s

FAILED (failures=1)
Error: Process completed with exit code 1.

Solution or next step

Troubleshoot and fix the end2end tests errors.

arkid15r commented 1 year ago

After a quick investigation it seems the problem is with an out of date SRA surveyor:

2023-04-24 19:34:49,854 i-03ccf74ae760b7e5e data_refinery_foreman.surveyor.external_source ERROR [survey_job: 129117]: Exception caught while discovering samples. Terminating survey job.
Traceback (most recent call last):
  File "/home/user/data_refinery_foreman/surveyor/external_source.py", line 170, in survey
    experiment, samples = self.discover_experiment_and_samples()
  File "/home/user/data_refinery_foreman/surveyor/sra.py", line 661, in discover_experiment_and_samples
    returned_experiment, samples = self._generate_experiment_and_samples(
  File "/home/user/data_refinery_foreman/surveyor/sra.py", line 446, in _generate_experiment_and_samples
    files_urls = [SraSurveyor._build_ncbi_file_url(run_accession)]
  File "/home/user/data_refinery_foreman/surveyor/sra.py", line 346, in _build_ncbi_file_url
    download_url = get_fasp_sra_download(run_accession)
  File "/usr/local/lib/python3.8/dist-packages/data_refinery_common/utils.py", line 339, in get_fasp_sra_download
    sra_url = full_url.split("fasp://")[1]
IndexError: list index out of range

Additionally:

curl https://www.ncbi.nlm.nih.gov/Traces/names/names.cgi
This service has been deprecated, please update to the latest version of the toolkit. See https://github.com/ncbi/sra-tools/wiki/01.-Downloading-SRA-Toolkit