NaegleLab / CoDIAC

GNU General Public License v3.0
0 stars 0 forks source link

PDB fetch failures due to internet connectivity - should be handled better #58

Open knaegle opened 2 weeks ago

knaegle commented 2 weeks ago

Description

We found some spurious fetching of PDB that likely results from internet connectivity losses during the fetch. We should include better annotations of fetching issues (right now we report it as invalid PDB IDs, but that's not the case) and attempt refetching. Also, could be good to consider doing an append, so things fetched could be added to later on.

Files

A list of relevant files for this issue. This will help people navigate the project and offer some clues of where to start.

To Reproduce

Steps to reproduce the behavior:

  1. Go to a cafe with bad internet and try running PDB fetching. - I kid, but probably need to enhance the probability of a failure by putting a constraint on internet.

Expected behavior

We should capture the error type on return of PDB fetch and re-attempt failed IDs if it is due to a web issue. We should also create an appending of files that will look at a list that was fetched and one that was attempted and add new lines as able. This preserves the run time of the first fetch, allowing extension later on.

Tasks

Include specific tasks in the order they need to be done in. Include links to specific lines of code where the task should happen at, if known

knaegle commented 1 week ago

Digging into this, found some serious inefficiencies in coding. Increased speed and added better handling of errors.

Structural changes to code: PDB interface now operates on only one PDB. Added looping outside of construction. Changed the annotation code, which used to have a number of if/elif to evaluate with function call to Checked all url requests for accurate fetch with a success status code of 200, else considered a failure Separated failures that were tolerable (like not being able to get the uniprot sequence) with intolerable error (no PDB information) If we have a failure, attempt up to 5 times in case it was an internet connectivity issue.

knaegle commented 1 week ago

Choosing not to add appending at this time, given the significant speed up of about 5-20fold.