Science-for-Nature-and-People / Midwest-Agriculture-Synthesis

Synthesizing and visualizing the impact of conservation agriculture
6 stars 4 forks source link

Filter mapped points to match data underpinning figure #35

Closed LesleyAtwood closed 5 years ago

LesleyAtwood commented 5 years ago

Use paper_list column in data set to filter mapped points.

swood-ecology commented 5 years ago

@nathanhwangbo do you have time to work on making the figure and map talk to each other? we need the selections for the figure to filter the map data and vice versa. ( @brunj7 )

nathanhwangbo commented 5 years ago

The crosstalk package should be able to get us what we want.

It looks like the first step will be to merge the map.data and summary_all data frames. To do this, we need a column we can use to match the two data frames, but I can't seem to find one. I took a look at the Paper_id column in map.data, but it's not obvious which column in summary_all it should match with. All three of the paper_list_id columns have very large values, which doesn't seem to match the formatting of Paper_id

Any ideas?

LesleyAtwood commented 5 years ago

For some reason the column type for the paper_list_id cols change from character to numeric when the csv loads in global.R .

When these columns are built in main-data-formatting.R each cell includes a list of comma delimited paper_id numbers [e.g. 15, 45, 134]. I'm not sure why the column type changes, but the change in column type appears to convert the list into a single large value.

swood-ecology commented 5 years ago

I believe there are some commands in the read in functions that allow you to specify data types for specific variables when reading the data.

From: LesleyAtwood notifications@github.com Reply-To: Science-for-Nature-and-People/Midwest-Agriculture-Synthesis reply@reply.github.com Date: Thursday, March 7, 2019 at 1:39 AM To: Science-for-Nature-and-People/Midwest-Agriculture-Synthesis Midwest-Agriculture-Synthesis@noreply.github.com Cc: Stephen Wood stephenawood@gmail.com, Comment comment@noreply.github.com Subject: Re: [Science-for-Nature-and-People/Midwest-Agriculture-Synthesis] Filter mapped points to match data underpinning figure (#35)

For some reason the column type for the paper_list_id cols change from character to numeric when the csv loads in global.R .

When these columns are built in main-data-formatting.R each cell includes a list of comma delimited paper_id numbers [e.g. 15, 45, 134]. I'm not sure why the column type changes, but the change in column type appears to convert the list into a single large value.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/Science-for-Nature-and-People/Midwest-Agriculture-Synthesis/issues/35#issuecomment-470348114, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AP2OxOZiJsi1z36o8RkRLDZUuI6Hxvvzks5vUG3PgaJpZM4bSt46.

LesleyAtwood commented 5 years ago

Here's the code for specifying column types with read_csv: read_csv(data-for-app.csv, col_types = cols(c(paper_id_list1, paper_id_list2, paper_id_list3)=col_character())))

The embedded 'here' command, however, doesn't like this code.

swood-ecology commented 5 years ago

does it work better like this?

read_csv("data-for-app.csv", col_types = cols(paper_id_list1 = col_character(), paper_id_list2 = col_character()))
nathanhwangbo commented 5 years ago

This line of code turns the paper_id_list columns into characters, so it works as intended.

I am still not sure how to match the paper_id_list columns in summary_all to the Paper_id column in map.data, since the first contains lists, and the second only contains single numbers.

swood-ecology commented 5 years ago

what's going on is that summary_all is an aggregated data frame for generating a figure. so, each line of that data frame contains data from multiple studies, which is why there's a vector of ids. what needs to happen is if any of the data in the summary_all are removed from the filters we allow the user to use, then that needs to drop those points out of the map.

@LesleyAtwood I'm not sure how to go in the other direction, though. if someone selects a map unit that filters out an 'id' contained in a summary_all observation where there are multiple ids, would we have to go back and re-aggregate the data to not include that particular id, but to include the others? how do you want to handle that?

nathanhwangbo commented 5 years ago

Regarding the first direction (summary_all -> map). Should I use all three paper_id columns? For example, say our filters remove a row in summary_all with paper_id_list1 = [93,95], paper_id_list2 = [70,88], and paper_id_list3 = [84,88]. In this case, do I remove all rows in map.data where Paper_id is 93, 95, 70, 88, 84, or 88?

Additionally, summary_all has many duplicates in the paper_id_list# columns. It looks like it would be possible for our filters to filter out one row with, for example, paper_id_list1 = 137, but still have other rows (not filtered out) that also have paper_id_list1 = 137. In this case, do we still want our map to filter all observations with Paper_id = 137?

LesleyAtwood commented 5 years ago

@swood-ecology, That makes sense, but the code we currently rely on for the summary table is not really setup to recompute filtered data on the fly. We'd have to reorganize and write a code that has the capacity to filter based on user input.

@nathanhwangbo the three paper_id_lists are nested within one another with the columns ending in '1' being the coarsest level, '2' the middle, and '3' the most specific. That means there can be multiple '2's for every '1', and multiple '3's for every '2'. In essence, it's a 1 to many relationship.

Currently, the select input boxes in the app draw only from the columns ending in '1' [i.e. Legend_1, mean_per_change1, sem_per_change1]. For now it makes sense to use the list of papers included in the paper_id_list1 column only.

For your second question, I don't think we're quite at the point to make that decision. I thought our first goal was to get the mapped points and the reference list to match the data displayed in the figure. In my mind all we need to do this is grab the list of papers included in paper_id_list1 and filter the mapped points and reference list based on the paper_id included in the list. Am I missing something?

nathanhwangbo commented 5 years ago

Going in the direction (plot filter -> map filter) is definitely doable using the method you describe, but the only way I see to go in both directions (map filter -> plot filter as well) is to use the crosstalk package I mentioned earlier, which requires us to actually merge the map.data and summary_all datasets into a single dataset. That's where things get a little messy, since it's hard to match the rows in one column with the rows in another when we don't have a column that's the same between them.

LesleyAtwood commented 5 years ago

I see. We will have to figure out another format for the data set the app relies on. It's starting to look like the raw data, opposed to a summary, is what we'll need.

swood-ecology commented 5 years ago

@LesleyAtwood do you want to make that decision now and figure it out, or hold off?

LesleyAtwood commented 5 years ago

I would like to add the (plot filter -> map) capability right now and hold off on the (map -> plot filter).

We need to think through how best to format the data not only for the (map -> plot filter) but also other future filtering functions so that we don't run into this issue again and again.

swood-ecology commented 5 years ago

ok. @nathanhwangbo do you want to work on filtering the map based on figure data? and I can remove the filtering capability on the map to not give the impression that a user should be able to filter by geography.

nathanhwangbo commented 5 years ago

I just pushed changes to the app which should filter the map based on the plot filters.

I noticed that 7 rows in map.data will never be plotted, since their Paper_id does not match any in summary_all. (These rows have Paper_id = 122, 123, or 72).

swood-ecology commented 5 years ago

@LesleyAtwood do you know what's up with these 7 rows?

LesleyAtwood commented 5 years ago

This is correct. The comparisons made in those papers (and subsequently the data we extracted from those papers) do not conform to the comparisons we display in the figures.

72 is a Pest Management paper where the control always includes a fungicide. Our baseline (0) in the figure is 'no pesticide' so these comparisons were excluded from the summary_all data set.

122 is a Nutrient Management paper where the control is 'unfertilized'. Our baseline for this set of figures does not include 'unfertilized'.

123 is also a Nutrient Management paper where the data extracted compares two different fertilizer injection practices. Again this is not included in our figures.

LesleyAtwood commented 5 years ago

@nathanhwangbo Thanks for adding the filtering. I'm loving it!