PESTools / pestools

PESTools
12 stars 10 forks source link

ObsInfo Enhancements #18

Closed echristi closed 9 years ago

echristi commented 9 years ago

Idea on subgroups. Allowing for as many different subgroups as the user would like.

import pandas as pd
import numpy as np

df = pd.DataFrame({'ObName' : ['bflow1', 'head2', 'head3', 'head4',
                        'head5', 'bflow2', 'head1', 'bflow3'],
                 'X' : np.random.randn(8), 'Y' : np.random.randn(8),
                 'SubGroups' : ['baseflow',
                                'head good; aquifer A; head',
                                'head medium; aquifer A; head',
                                'head good; aquifer B; site data; head',
                                'head medium; head',
                                'baseflow; stream_A; baseflow',
                                'head',
                                'baseflow stream_A; baseflow']})

# Convert all the items in the SubGroups to a list
for index, row in df.iterrows():
    row['SubGroups'] = row['SubGroups'].strip().split(';')

def filter_obs_info(subgroup):
    filtered = df[df['SubGroups'].apply(lambda x: subgroup in x)]
    return  filtered

head = filter_obs_info('head')
baseflow = filter_obs_info('baseflow')
head_good = filter_obs_info('head good')
site_data = filter_obs_info('site data')
aleaf commented 9 years ago

Evan, how about this. It would keep things a little more organized.

See also the 'noodling' ipynb in my fork and the Res and Rei notebooks. I made the basic changes to the dataframes in those, but haven't cast them into any methods. At some level we may just want to provide examples of pandas code to the user to get the results they want. The plotting methods- either pandas or our own, could be general enough to work with whatever they are given.

Also added arguments to the method to read the obsinfo file for basename (to distinguish transient observations associated with a common measurement point), datetime, and group columns- which could be a list of any columns that one might want to group by.

import datetime as dt
times = [pd.to_datetime('2014-01-01') + dt.timedelta(i) for i in range(len(df))]

df = pd.DataFrame({'ObName' : ['bflow1', 'head2', 'head3', 'head4',
                               'head5', 'bflow2', 'head1', 'bflow3'],
                 'X' : np.random.randn(8), 'Y' : np.random.randn(8),
                 'datetime': times, 
                 'basename': ['bflow', 'head2', 'head3', 'head3',
                               'head3', 'bflow', 'head1', 'bflow'],
                 'aquifer': ['', 'Mt. Simon', 'St. Peter', 'St. Peter',
                              'St. Peter', '', 'Mt. Simon', ''],
                 'source': ['baseflow', 'WCR', 'Station', 'Station',
                            'Station', 'baseflow', 'WCR', 'Gage']})

df[df.source == 'WCR']
df[df.aquifer == 'St. Peter']
# boolean indexing with list comprehensions could still be used to find additional text in columns if the user really wants:
df[[True if 'Peter' in r.aquifer else False for i, r in df.iterrows()]]
echristi commented 9 years ago

I haven't had a chance to look at your notebooks but in general what you describe above makes sense. I didn't see a noodling notebook. Did that get pushed?

I think you are right about providing examples and letting the user do some work. Otherwise this will be a big rabbit hole.

aleaf commented 9 years ago

forgot. Just pushed it.

Andy

On Dec 17, 2014, at 9:08 AM, echristi notifications@github.com wrote:

I haven't had a chance to look at your notebooks but in general what you describe above makes sense. I didn't see a noodling notebook. Did that get pushed?

I think you are right about providing examples and letting the user do some work. Otherwise this will be a big rabbit hole.

— Reply to this email directly or view it on GitHub.