WSWCWaterDataExchange / MappingStatesDataToWaDE2.0

Manage all code to map and import state's data into WaDE 2.0
BSD 3-Clause "New" or "Revised" License
4 stars 1 forks source link

Future Project: Refine and finish OwnerClassificationCV field for WaDE 2.0. #82

Open rwjam opened 3 years ago

rwjam commented 3 years ago

In addition to listing the owner of a water right, WSWC would also like to categorize the owners into groups. However, states do not track this information so it is up to us to create this.

Initial large scale categories thus far include: Privately Owned, Commercial, Military, Government, & Natural Resources. With each large scale category broken up further into smaller, more defined groups (e.g., Bureau of Land Management, etc). Initial design processes has used keyword search that starts with a generic search, that is replaced with a more specific search (see for loop approach in code).

Work most likely to be assigned to an WSWC intern. Intern responsibilities would be as follows...

  1. review state's provided owner input information (by hand) for clarity and accuracy. If need be, intern should produce a duplicate 'WadE owner information input' field that scrubs, cleans, and standardizes the state's provided owner input information, which will guarantee correct assignment of OnwerClassificaitonCV tags.
  2. Run WaDE OnwerClassificaitonCV scripts. Update as needed, but first approved by WSWC staff.
  3. review (by hand) resulting assigned OnwerClassificaitonCV tags to original state's provided owner input information to ensure accuracy.

A clean input would help reduce the amount of errors and exceptions that need to be provided. Each state has a different method for listing owners (e.g., single owner, multiple owner and owner types, last name included but no first name, etc), so cutting down the number of exceptions per state will improve accuracy. As long as it's a duplicate from the original owner provided information, that should keep WaDE's promise not to override state given data

OwnerClassificationCV Table located here: https://docs.google.com/spreadsheets/d/1tZ3DIYDx7J-dsldHfihQjOABibMpFoGKvmSe8ReKW10/edit?ts=5de93fa0#gid=1004421132

Initial work, script, and notes located here: https://github.com/WSWCWaterDataExchange/MappingStatesDataToWaDE2.0/tree/master/OwnerClassification

amabdallah commented 1 year ago

@rwjam Not sure if this is the best place to post about this issue so feel free to move it where you see fit Why WestDAAT does not have an owner classification value ="United States of America"? as we have it under the CVs list in the google sheet? https://westdaat.westernstateswater.org/

https://docs.google.com/spreadsheets/d/1tZ3DIYDx7J-dsldHfihQjOABibMpFoGKvmSe8ReKW10/edit?usp=sharing

I was looking for Idaho as an example where they said they do have owners as federal or tribal. The general federal owner is usually the United States of America.

For the tribal rights in Idaho, it appears that WestDAAT only shows 24 rights (see the pie chart). Based on what Jerry Rigby said, there should be more tribal rights. Could you look more into the classification in Idaho? https://westdaat.westernstateswater.org/?state=N4Ig7gTiBcoPZgHYFMIGEA2BDAzjglgGb4DGWALvnIjjANogByF%2BAbsgAQCCAtqqVkQgAugBoQOchWS1oDAJIAREeIAOcDAE8M%2BFPTEhEGACb4AYvgzlUiilhihslcgFdjyGIhcYM4jNQBzfFd3T29fEFMIZBJKalkAZnFjOwAFOF1yWQB2AF9c8R4sVQcQAC84OB4AGWR2DBgATgA6AA4ANgTG7PaAFgBWRtbWxt7WgCY-FhCPaF7x5oTs7I7e7IBGdv72xvX%2Bv0Dgt1mAWnXN5t726-bx1vXWq4SABlb8oA

rwjam commented 1 year ago

Why WestDAAT does not have an owner classification value ="United States of America"? as we have it under the CVs list in the google sheet?

Ran a quick SQL script. We have 41,450 water rights with a OwnerClassificaitonCVs labeled as “United States of America”

image

I was looking for Idaho as an example where they said they do have owners as federal or tribal. The general federal owner is usually the United States of America. For the tribal rights in Idaho, it appears that WestDAAT only shows 24 rights (see the pie chart). Based on what Jerry Rigby said, there should be more tribal rights. Could you look more into the classification in Idaho?

Currently we should be showing 69 water rights in Idaho with a OwnerClassificaitonCVs labeled as “Native American” and another 29,474 water rights with a OwnerClassificaitonCVs labeled as “United States of America.” It’s possible we are missing some, but that might get into some values being truncated as we are only supporting the first value given in a owner list.

image

(see comments below) We use a string search based on similar words provided. For Native American, we are looking for ["tribe", "tribes", "nation", "nations", "indians"] words, & for United States of America we are looking for ["united states of america", "united states america", "usa"] words.

As a heads up those links you are sharing are for the production database / prod WestDAAT, which we has not been updated and does not do a good job at reflecting the work we have done in the last several months. The uat version(s) are showing slightly better results. This gets into difficulties we've had with copying data from the uat to the prod, which sounds like DPL might try to improve for us (based on 01/05/2023 meeting notes).

https://westdaatstaging.westernstateswater.org/?state=N4Ig7gTiBcoA4HsA2BPJBLAdgUxgbQF0AaETJAE3QDF0kAXbCAEQEM6WZQk306BXcrmiY%2BSJCSQJMAc14ChIsSUoRsAYzropAZxgBmZWxYAFBFjq7oAdgC%2BJbewaW8IAJJMQxEAjA4IAYW5tbXQAM3Q1Hh18EAA5HgA3bAACAEEAW0YIlkxPOxB0ljhOEAAvBAR0gBlsJKQYKwA6AFYAFgBGAAZmzoA2PSteqwAmKz09ZokefkEYVr1GzoAOQbGxrt7h4daJKVkZoQBadvbWlt72vU72pd6%2B5vX89MsQchYIAGtDhK6QGyA

rwjam commented 1 year ago

A quick comment on we are currently approaching the OnwerClassificationCV value.

We use a custom function to assign OnwerClassificationCV based on a provided owner value. We store that function and documentation on GitHub here: https://github.com/WSWCWaterDataExchange/MappingStatesDataToWaDE2.0/tree/master/5_CustomFunctions/OwnerClassification

Owner type is technically a 1-M relationship, with a single water right -to- multiple owners. However, we don’t support that relationship in WaDE. As of right now we assign OwnerClassificaitonCV based on an ordered list of values, with the exception that we truncate anything we consider to be “In Review” to be the least important (see code image). Then it’s just a simple match the first word in the provided list.

image

This get's into similar issues to how we treat PrimaryBeneficialUseCategory. We use that field for categorization and labeling with the legend for WestDAAT, as WestDAAT does not support trying to color a single water right site with multiple beneficial uses. We would run into the same issue here with OwnerClassificaitonCV. We could use the same approach with OwnerClassificaitonCV as we do PrimaryBeneficialUseCategory and use a hierarchy approach on determining which OwnerClassificaitonCV we consider to be more important, we just need to be ready to make that arugment of how we made our selecetions.