fluby / neonetods

Data and associated tools for the NEON Existing Terrestrial Organism Data Survey
9 stars 4 forks source link

Spatial scale is not being loaded into the Sources table #4

Closed ethanwhite closed 11 years ago

ethanwhite commented 12 years ago

The spatial scale tags (e.g., 'site', 'offsite', 'supersite', etc.) are not being loaded into the Sources table.

ethanwhite commented 11 years ago

This data is present in the tags field. It just isn't being parsed. Looks like this was never implemented.

ethanwhite commented 11 years ago

@kblevins Is it possible to get a list of all of the tags currently being used in the Mendeley group? I need to make sure that I cover all of the possible spellings of things like 'off site' when parsing the tags column (I suspect there are several sadly). If not I can pull them out of the data we're pulling down, it's just more time not spent on the core issues.

ethanwhite commented 11 years ago

@kblevins never mind. I just pull them out. The full list is:

[u'ABBY', u'BART', u'BLAN', u'BONA', u'CAPL', u'CHOC', u'CPER', u'California', u'D1', u'D10', u'D11', u'D12', u'D13', u'D14', u'D15', u'D16', u'D17', u'D18', u'D19', u'D2', u'D20', u'D3', u'D4', u'D5', u'D6', u'D7', u'D8', u'D9', u'DCFS', u'DEJU', u'DELA', u'DSNY', u'FEXF', u'GRSM', u'GUAN', u'HARV', u'HEAL', u'JERC', u'JORN', u'LAJA', u'MLBS', u'MOAB', u'NIWO', u'NOGP', u'OAES', u'OLAA', u'ONAQ', u'ORNL', u'OSBS', u'PLUM', u'POKE', u'PONC', u'PUWI', u'PUWU', u'RBUT', u'RMNP', u'SCBI', u'SERC', u'SRER', u'STEI', u'STER', u'TALL', u'THAY', u'TREE', u'UNDE', u'UNDERC', u'UOBS', u'WNV', u'WOOD', u'WREF', u'aboveground', u'abundance', u'agriculture', u'all', u'alpine', u'beetle', u'beetles', u'belowground', u'biodiversity', u'biodivesity', u'biomass', u'birds', u'book', u'boreal', u'coniferous', u'county', u'd1', u'database', u'dataset', u'deciduous', u'desert', u'ecoregion', u'endangered species', u'endangered species info', u'forest', u'grassland', u'herp', u'herps', u'invertebrates', u'jouranl article', u'journal article', u'later', u'mammal', u'mammals', u'microbes', u'mixed', u'mosquito', u'mosquitoes', u'mosquitos', u'off site', u'off-site', u'old', u'peatland', u'phenology', u'plant', u'plants', u'region', u'regional', u'riparian', u'savanna', u'shrubland', u'site', u'site details', u'soil', u'spatextent=11736', u'spatextent=42.5', u'species list', u'species lists', u'stand data', u'state', u'status', u'super site', u'supersite', u'taiga', u'taxonomy', u'thesis', u'ticks', u'tropical', u'tundra', u'urban', u'website', u'wetland', u'woodland']

ethanwhite commented 11 years ago

Spatial scale is now working. It should be noted by NEON that in some cases the 'site' tag was not explicitly included during data collection. This has been added in all cases where none of the other tags indicating a spatial scale were present.

Still running down something weird with the spatial extent...

ethanwhite commented 11 years ago

These records should all have spatial extents (tags and urls shown):

  1. [u'D3', u'JERC', u'journal article', u'mammals', u'site', u'spatextent=11736', u'woodland'] http://www.mendeley.com/c/4974603662/g/2058663/cochrane-2006-spatial-organization-of-adult-bobcats-in-a-longleaf-pine-wiregrass-ecosystem-in-southwestern-georgia/
  2. [u'D3', u'JERC', u'journal article', u'mammals', u'site', u'spatextent=11736', u'woodland'] http://www.mendeley.com/c/4974603722/g/2058663/howze-2009-predator-removal-and-white-tailed-deer-recruitment-in-southwestern-georgia/
  3. [u'D2', u'dataset', u'forest', u'plants', u'SCBI', u'SERC', u'site', u'spatextent=42.5'] http://www.mendeley.com/c/5000465602/g/2058663/parker-2012-smithsonian-big-tree-forest-map/
  4. [u'D3', u'journal article', u'later', u'OSBS', u'plants', u'site', u'spatextent=11736', u'species list'] http://www.mendeley.com/c/4974603672/g/2058663/cathey-2008-evaluation-of-native-legume-growth-and-phenology-under-shaded-conditions-for-restoration-of-longleaf-pine-wiregrass-ecosystem/

Both 1 and 4 are not present in the table or the failed_sources list. The others are in the table, but just missing the extent values.

ethanwhite commented 11 years ago

All 4 sources are being correctly parsed and returned by get_mendeley_data.get_source_data().

ethanwhite commented 11 years ago

OK, 1 and 4 are apparently missing because they are not in our species lists. It's weird that they are in sources data, with site tags, but not in the species lists, but that's beyond the scope of this issue (possibly something that NEON folks will want to take a look at).

So, it's just a question of why 2 and 3 aren't getting pushed into the database after they have been properly parsed.

ethanwhite commented 11 years ago

Fixed! (it was just a cached file that was holding things up).