SGWilliams / GAPProduction

A package of python code used in producing and managing GAP wildlife range maps and habitat models.
0 stars 3 forks source link

pysb times out and causes problems? #2

Closed nmtarr closed 7 years ago

nmtarr commented 7 years ago

The sciencebase.GetHabMapDOI function is very slow because it has to assess many sb IDs before finding a match. That function is also used in gapmetadata.ScienceBaseCSV and the slow speed may be causing it to fail as the sb session times out.

Should the doi's and/or sb IDs be stored in databases rather than queried across the web for faster speed?

In the mean time, I changed ScienceBaseCSV code to where it uses "???" to bypass this problem.

skybristol commented 7 years ago

We do see issues with ScienceBase performance and downtime that are routinely a pain. Unfortunately, I've come to expect that and have come up with ways around it. I think we use ScienceBase for what it's good for - a relatively stable and supported base repository for information and data that are appropriate for that platform - and build out other data infrastructures to support the more dynamic aspects of what we need to do. We have our DOIs that we include in citations point to ScienceBase because it's the responsible repository that USGS is maintaining. But to use the information (metadata and data) effectively, we need to spin up various forms of the content that are optimized for use. For instance, here's a code snippet from the in process notebook to mint DOIs for the range items that puts together a couple of different data structures (dictionary providing rapid lookup of GAP code to ScienceBase ID and list of dicts for selected information from all 1719 docs) in memory pretty quickly by looping the range maps collection through the ScienceBase API. At some point, one of us can generalize this into a function in the bis package for broader use.

rangeMaps = {}
rangeMapDocs = []
rangeMapSearchResults = sb.find_items("parentId="+_rangeMapRoot+"&fields=title,body,contacts,identifiers&max=100")
while rangeMapSearchResults is not None:
    for rangeItem in rangeMapSearchResults["items"]:
        rangeMaps[next((i for i in rangeItem["identifiers"] if i["type"] == "GAP_SpeciesCode"), None)["key"]] = rangeItem["id"]
        thisRangeMapDoc = {}
        thisRangeMapDoc["id"] = rangeItem["id"]
        thisRangeMapDoc["identifiers"] = rangeItem["identifiers"]
        thisRangeMapDoc["title"] = rangeItem["title"]
        thisRangeMapDoc["body"] = rangeItem["body"]
        thisRangeMapDoc["contacts"] = rangeItem["contacts"]
        rangeMapDocs.append(thisRangeMapDoc)
    rangeMapSearchResults = sb.next(rangeMapSearchResults)
nmtarr commented 7 years ago

Thanks for the example. I ended up with a work around for the recent tasks but look forward to getting some code in place to retrieve such info in the future. Let us know when you get a solid version of the bis and we can work on getting it installed in our environment.