andrewxhill / MOL

The Map of Life
mol.colorado.edu/
19 stars 4 forks source link

GBIF Point Data Workflow #37

Closed andrewxhill closed 13 years ago

andrewxhill commented 13 years ago

Aaron and I just had a chat to lay out some of the steps for point data. We are going to begin developing methods to display GBIF points on maps. Here are the steps for phase one

1. Client sends request to Frontend (GAE)
2. Frontend checks cache
  2a. If in cache, returns
  2b. If not in cache, redirects client to GBIF for a client jsonp call
    2b i. Frontend tells Remote to queue a job to gather the dataset from GBIF
    2b ii. When job is complete, Remote sends dataset to Frontend to be cached
3. Client receives dataset as many pages of JSON
  3a. Client parses JSON and displays the data in the map (updating the map after each page)
andrewxhill commented 13 years ago

First draft API method is available now, http://prototype.mol-lab.appspot.com/api/points/gbif/animalia/species/puma_concolor

If you use Chrome, I suggest you install the 'JSONView for Chrome' extension

andrewxhill commented 13 years ago

right now the GbifDataHandler method replaces any uncertainty=0 with uncertainty=null (i.e. unknown) for the JSON response. makes sense to me, let me know if this is a bad idea.

eightysteele commented 13 years ago

John, does uncertainty = 0 have a different meaning than no uncertainty at all?

tucotuco commented 13 years ago

That GbifDataHandler method is the correct treatment. Uncertainty cannot be zero - any value given to GBIF as zero is an error (ha) and really means that the uncertainty is unknown and should therefore be better represented by a null, unless the client understands that zero means something special.

andrewxhill commented 13 years ago

Went ahead and changed the GbifDataHandler to reflect a few comments from today's talk. Primarily, the default method returns a 'small' dataset, limited to only 1000 records max, only with coordinates, and with minimal additional data fields for the records. I will code up a 'full' method in GbifDataHandler when the time comes that will parse a complete set no matter the size and give back more complete statement of a record.

I also added a 'skipcache' variable to the url, if it is included it will not check the cache first but query GBIF and rebuild cache no matter. This is good for both testing and later will give the client code the ability to force the server to get the newest gbif data.

eightysteele commented 13 years ago

+1 all around.

andrewxhill commented 13 years ago

Big change to the JSON standard and harvest method, not getting records from GBIF XML not KML, http://prototype.mol-lab.appspot.com/api/points/gbif/animalia/species/puma_concolor I've reduced the max records by default to 200 for now.

andrewxhill commented 13 years ago

The workflow is in place and has been tested during UI development. I think we are at a good place to close this now. We can open specific Issues as they arise.