AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
100 stars 67 forks source link

CI failing at TCGA Capture Kit Investigation #1277

Closed sjspielman closed 2 years ago

sjspielman commented 2 years ago

Recently builds are failing with this error: This module has not recently been updated so source is not clear.

  File "scripts/get-tcga-capture_kit.py", line 74, in <module>
    df.columns = ['filename','kit_name','kit_url']
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py", line 5192, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/_libs/properties.pyx", line 67, in pandas._libs.properties.AxisProperty.__set__
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py", line 690, in _set_axis
    self._data.set_axis(axis, labels)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/internals/managers.py", line 183, in set_axis
    "values have {new} elements".format(old=old_len, new=new_len)
ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements
sjspielman commented 2 years ago

CC @jaclyn-taroni

sjspielman commented 2 years ago

The error is coming from the first step in the analysis pipeline: scripts/get-tcga-capture_kit.py. It has something to do with scraping from this external URL : https://api.gdc.cancer.gov/files

On line 60,

gdc_response = gdc_response.json()

I am seeing gdc_response ends up defined as

{'warnings': {}, 'data': {'hits': [], 'pagination': {'count': 0, 'page': 0, 'sort': '', 'total': 0, 'size': 5000, 'from': 0, 'pages': 0}}}

As a consequence, subsequent parsing at line 64 is yielding empty lists without expected fields; capture_kits ends up as an empty list after the loop.

My quickest suggestion here is just to add a try/except where we print a message and effectively pass on the except (?) to avoid a CI failure. pass is not necessarily a great choice for handling errors, though.

Noting also this script using the v14 data release.

jaclyn-taroni commented 2 years ago

I don't think this actually gets used anymore. So if the results are captured in the repo, I think we can deprecate the module and remove it from CI.

sjspielman commented 2 years ago

Closed with #1278