andrewxhill / MOL

The Map of Life
mol.colorado.edu/
19 stars 4 forks source link

Bulkloading 'LayerIndex' fails: bulkload* files need update? #109

Closed gaurav closed 12 years ago

gaurav commented 13 years ago

Bulkloading to my local App Engine instance is working fine for the 'Layer' data, but not the 'LayerIndex' data. Additionally, I get the following error when running loader.py': https://gist.github.com/1179412

I suspect this is because bulkload.yaml and bulkload_helper.py haven't been properly updated for the new field list entering our system. I'd be willing to trust metagen.py's new version of bulkload.yaml (see https://github.com/andrewxhill/MOL/blob/master/workflow/mol-data/bulkload.yaml.autogen) but not metagen.py's new version of bulkload_helper.py (see https://github.com/andrewxhill/MOL/blob/master/workflow/mol-data/bulkload_helper.py.autogen), which is very different from the current bulkload_helper.py.

Do either of you have any idea why I'm getting this error, and whether there's an easy fix for it? If not, I'll start slogging into bulkload_helper.py and try to figure out what's going on/going wrong.

eightysteele commented 13 years ago

Bulkloading to my local App Engine instance is working fine for the 'Layer' data, but not the 'LayerIndex' data.

From the gist:

File "/Users/vaidyagi/Development/mol/workflow/mol-data/bulkload_helper.py", line 91, in wrapper
    ('LayerIndex', 'polygonid'))(value, bulkload_state)

So the bulkloader is either not finding a polygonid column in the CSV file or the value for the polygonid column is the empty string. Can you confirm that polygonid is actually in the CSV file? If not, we'll have to dig into loader.py to see what's up.

eightysteele commented 13 years ago

Sooooo, ha, is polygonid no longer a required DBF field? If not, and if it's not mapped in config.yaml, then it won't show up in the CSV file, and you'll get the error. If it's no longer required by the user, then we'll need loader.py to auto-gen it since it is part of the LayerIndex key. Hopefully that make sense?

gaurav commented 13 years ago

That was it! @0055e6c fixes this problem for now. Two quick questions before I close this issue:

  1. Why do we need both an 'areaid' and a 'polygonid' field? (I checked: the bulkloader needs both fields to be defined for it to work). What is the distinction between the two?
  2. Do we expect the 'polygonid/areaid' fields to be populated automatically by ArcGIS or another GIS software? If so, why are they missing in two of the three sample files in Github (only IUCN Amphibians has an 'OBJECTID' field) - if they are likely to be missing in real input for the same reason, we should probably come up with some kind of a workaround.
eightysteele commented 13 years ago

Why do we need both an 'areaid' and a 'polygonid' field? (I checked: the bulkloader needs both fields to be defined for it to work). What is the distinction between the two?

The areaid is an area the contains multiple polygons identified by polygonid. So it's a way of relating all the polygons that belong to an area. Does that make sense?

Do we expect the 'polygonid/areaid' fields to be populated automatically by ArcGIS or another GIS software?

Not sure,. @robgur? @walterj?

gaurav commented 13 years ago

Okay, the areaid/polygonid distinction makes sense to me now. I'll turn polygonid back into a required field then.

eightysteele commented 13 years ago

+1

gaurav commented 12 years ago

I'm closing this issue now; I'm not sure we know enough to answer my question about how to work with non-shapefile inputs. We can cross that bridge once we get to it.