Open steven-albanese opened 7 years ago
Can we just catch those exceptions in a try...except
block?
try:
# do stuff
except Exception as e:
# log the exception
Ideally, we'd be able to discriminate things we should just log and move on from (the gene ID is not found) from other more serious errors. Perhaps retrieve_mutants_xml
should return None
or get_mutation_data_as_xml
should check if the gene exists and handle this gracefully? Or maybe we can check if the gene exists earlier in the process before retrieving mutations?
Ok for right now, I've just changed it to print the name of the gene and then continue instead of raising an exception in PR #26. I think this is fine for now, but we should probably come up with a better system of logging these errors after the grant deadline
I was able to figure out which gene was causing the problem, so I added it to the manual overrides. Unfortunately, that makes the URL too long and I get the following error:
urllib2.HTTPError: HTTP Error 414: Request-URI Too Large
I'm trying to figure out the best way to fix this
Looks like this function is the problem:
def retrieve_extended_mutation_datatxt(case_set_id,
genetic_profile_id,
gene_ids,
portal_version='public-portal',
write_to_filepath=False
):
"""
Queries cBioPortal for "ExtendedMutation" format data, given a list of cBioPortal cancer studies and a list of HGNC Approved gene Symbols.
Returns the data file as a list of text lines.
Parameters
----------
portal_version: str
'public-portal': use only public cBioPortal data
'private': use private cBioPortal data
write_to_filepath: str (or False)
"""
gene_ids_string = '+'.join(gene_ids)
mutation_url = 'http://www.cbioportal.org/{0}/' \
'webservice.do' \
'?cmd=getMutationData' \
'&case_set_id={1}' \
'&genetic_profile_id={2}' \
'&gene_list={3}'.format(
portal_version,
case_set_id,
genetic_profile_id,
gene_ids_string
)
response = urllib2.urlopen(mutation_url)
page = response.read(1000000000)
if write_to_filepath:
with open(write_to_filepath, 'w') as ofile:
ofile.write(page)
lines = page.splitlines()
return lines
When working with the whole kinome, the url is too long. I'm not very familiar with this, but what I've seen online is that there is a character limit for the urls here. I've seen a few different ways to correct this, but I'm not familiar with flask
to know which one is appropriate in our case
This issue has been addressed with the inclusion of try..except blocks of code as well as chunking the list of genes when requesting information.
A more elegant proposal could be made to handle the header information as well as the Unknown gene warning discussed in the PR #26
I've run into the following error for a handful of genes:
I know this can be handled by adding the unknown genes to the manual override file, but is there a better way to handle these cases?