funginstitute / disambiguator

Other
28 stars 17 forks source link

Missing geographical data for records with grant year 2011-2013 #4

Open AVermeij opened 11 years ago

AVermeij commented 11 years ago

For all records with grant year 2011 or 2012, the colums Street, City, State, Country, Zipcode, Longitude and Latitude contain no data at all. This holds for both the Full 2012 and the January 2013 disambiguations. Interestingly, about half of the records with grant year 2013 do contain this data. I checked whether this missing data had an effect on the resulting disambiguations for the respective inventors, but this doesn't seem to be true - regardless of the missing data, the inventors are still properly disambiguated.

First picture attached shows a simple pivot table showing that about half of the 2013 records miss geographical data (country taken as an example); the second picture shows some examples of missing 2011 data.

country missing_data

doolin commented 11 years ago

This is a known issue, and here is what we know.

The 2010 and earlier data is merged from previously disambiguated data also posted on DVN.

We introduced a bug when the NGA location schema changed, and didn't catch it until late last fall.

The complete 2012 disambiguation used 2011 and 2012 parses incorporating the totally broken location data.

The 2013 parse shows a partial fix to locations is now being incorporated.

We're working pretty hard to fix the location/geocoding functionality right now. Once we have it, we'll update the 2011 and 2012 parses to reincorporate locations.

Street addresses are a bonus, they are not often reported. We have 15% of them database wide.

Here's the current stats: http://funginstitute.github.com/statistics/