boscoh / uniprot

retrieve protein sequence identifiers and metadata from http://uniprot.org
67 stars 15 forks source link

Opaque behavior in fetch_uniprot_metadata and batch_uniprot_metadata #4

Closed dmyersturnbull closed 7 years ago

dmyersturnbull commented 8 years ago

batch_uniprot_metadata behaves the way I would expect it to, but fetch_uniprot_metadata does not. Also, there should be warnings or exceptions raised when a UniProt ID is not found.

This makes sense:

print(uniprot.batch_uniprot_metadata('P42681')) 

Output:

Fetching metadata for 6 Uniprot IDs from http://uniprot.org ...
{}

This doesn't:

print(uniprot.fetch_uniprot_metadata('P42681'))

Output:

Fetching metadata for 6 Uniprot IDs from http://uniprot.org ...
{}

And then there's this, which should definitely raise an exception.

print(uniprot.fetch_uniprot_metadata(['junk']))

Output:

Fetching metadata for 6 Uniprot IDs from http://uniprot.org ...
{}

And perhaps batch_uniprot_metadata should warn when some (or especially all) aren't found:

print(uniprot.batch_uniprot_metadata(['junk']))

Output:

Fetching metadata for 6 Uniprot IDs from http://uniprot.org ...
{}
boscoh commented 8 years ago

Sorry to take so long to reply, but thanks for the question.

The way I conceived of fetch_uniprot_metadata and batch_uniprot_metadata is that batch.. is the batched version of fetch.... They should have exactly the same behaviour with the difference that fetch.. makes 1 single api call (which can easily fail for large queries), whereas batch... makes a lot of smaller calls.

It would be ideal to call exceptions on failure, but the uniprot website belies a complex and difficult to parse search engine. The return values for the uniprot website has no error codes, and it's almost impossible to figure out why, sometimes, nothing is returned. So as a result, the functions give you back what the uniprot website returns.

I hope that helps.

dmyersturnbull commented 7 years ago

Got it, thanks! Forgot to close this.