chembl / chembl_webresource_client

Official Python client for accessing ChEMBL API
https://www.ebi.ac.uk/chembl/api/data/docs
Other
368 stars 95 forks source link

MaxRetryError #48

Closed wmmxk closed 5 years ago

wmmxk commented 5 years ago

I was trying to get the descriptors for 968 chemicals from ChemBL. I run into this error: MaxRetryError: HTTPSConnectionPool(host='www.ebi.ac.uk', port=443): Max retries exceeded with url: /chembl/api/data/molecule/BQJRUJTZSGYBEZ-KTKRFUPESA-N (Caused by ResponseError('too many 404 error responses',)). Could I know how long I have to wait afer each request?

hkmztrk commented 5 years ago

@wmmxk Have you solved this?

wmmxk commented 5 years ago

It turned out my code got stuck when fetching descriptor for one molecule. So I put my code in a try except statement. Now it is OK.

pbar = tqdm(enumerate(InchiKeys,1), total = len(InchiKeys)) descriptor_all = {} molecule = new_client.molecule for i, InchiKey in pbar: try: SMILE = molecule.get(InchiKey)['molecule_structures']['canonical_smiles'] chemical = utils.smiles2ctab(SMILE) log_p = json.loads(utils.logP(chemical))[0] descriptors = json.loads(utils.descriptors(chemical))[0] descriptors["log_p"] = log_p descriptor_all[InchiKey] = descriptors except: bad_keys.append(InchiKey)

hkmztrk commented 5 years ago

thank you @wmmxk , using try except blocks worked for me as well :)

juanfmx2 commented 5 years ago

Hi @wmmxk, @hkmztrk, sorry it took me a while to get back to you, but I am in charge of other projects and I have not been able to work on the webs services lately, yes, the issue here is you are getting 404 when there are no molecules in the database with that inchi key, so the try/except is the right way to go. However, take into account that our smiles2ctab/logP/etc service is based on top of rdkit, but we are not sure sure wether it could fail for a given complex molecule. so if you want to know for sure wether the inchi key is in our database the try/except should only surround the molecule.get

pbar = tqdm(enumerate(inchi_keys,1), total = len(inchi_keys))
descriptor_all = {}
molecule = new_client.molecule
for i, inchi_key in pbar:
  try:
    smiles_i = molecule.get(inchi_key)['molecule_structures']['canonical_smiles']
  except:
    # Inchi key was not found in ChEMBL
    bad_keys.append(inchi_key)

  try:
    molecule_ctab_i = utils.smiles2ctab(smiles_i)
    log_p = json.loads(utils.logP(molecule_ctab_i))[0]
    descriptors = json.loads(utils.descriptors(molecule_ctab_i))[0]
    descriptors["log_p"] = log_p
    descriptor_all[inchi_key] = descriptors
  except:
    # smiles2ctab, logP or something else failed
    pass
wmmxk commented 5 years ago

Hi @juanfmx2 . No worries. I totally understand a programmer is usually working on a few projects. Yes, I agree I need to test the particular molecule which fails my code. Initially, I thought it is because the website does not allow too many visit from one IP in a very short time.