Closed LesleyAtwood closed 5 years ago
@nathanhwangbo do you have time to work on making the figure and map talk to each other? we need the selections for the figure to filter the map data and vice versa. ( @brunj7 )
The crosstalk
package should be able to get us what we want.
It looks like the first step will be to merge the map.data
and summary_all
data frames. To do this, we need a column we can use to match the two data frames, but I can't seem to find one. I took a look at the Paper_id
column in map.data
, but it's not obvious which column in summary_all
it should match with. All three of the paper_list_id
columns have very large values, which doesn't seem to match the formatting of Paper_id
Any ideas?
For some reason the column type for the paper_list_id cols change from character to numeric when the csv loads in global.R .
When these columns are built in main-data-formatting.R each cell includes a list of comma delimited paper_id numbers [e.g. 15, 45, 134]. I'm not sure why the column type changes, but the change in column type appears to convert the list into a single large value.
I believe there are some commands in the read in functions that allow you to specify data types for specific variables when reading the data.
From: LesleyAtwood notifications@github.com Reply-To: Science-for-Nature-and-People/Midwest-Agriculture-Synthesis reply@reply.github.com Date: Thursday, March 7, 2019 at 1:39 AM To: Science-for-Nature-and-People/Midwest-Agriculture-Synthesis Midwest-Agriculture-Synthesis@noreply.github.com Cc: Stephen Wood stephenawood@gmail.com, Comment comment@noreply.github.com Subject: Re: [Science-for-Nature-and-People/Midwest-Agriculture-Synthesis] Filter mapped points to match data underpinning figure (#35)
For some reason the column type for the paper_list_id cols change from character to numeric when the csv loads in global.R .
When these columns are built in main-data-formatting.R each cell includes a list of comma delimited paper_id numbers [e.g. 15, 45, 134]. I'm not sure why the column type changes, but the change in column type appears to convert the list into a single large value.
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/Science-for-Nature-and-People/Midwest-Agriculture-Synthesis/issues/35#issuecomment-470348114, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AP2OxOZiJsi1z36o8RkRLDZUuI6Hxvvzks5vUG3PgaJpZM4bSt46.
Here's the code for specifying column types with read_csv: read_csv(data-for-app.csv, col_types = cols(c(paper_id_list1, paper_id_list2, paper_id_list3)=col_character())))
The embedded 'here' command, however, doesn't like this code.
does it work better like this?
read_csv("data-for-app.csv", col_types = cols(paper_id_list1 = col_character(), paper_id_list2 = col_character()))
This line of code turns the paper_id_list
columns into characters, so it works as intended.
I am still not sure how to match the paper_id_list
columns in summary_all
to the Paper_id
column in map.data
, since the first contains lists, and the second only contains single numbers.
what's going on is that summary_all
is an aggregated data frame for generating a figure. so, each line of that data frame contains data from multiple studies, which is why there's a vector of ids. what needs to happen is if any of the data in the summary_all
are removed from the filters we allow the user to use, then that needs to drop those points out of the map.
@LesleyAtwood I'm not sure how to go in the other direction, though. if someone selects a map unit that filters out an 'id' contained in a summary_all
observation where there are multiple ids, would we have to go back and re-aggregate the data to not include that particular id, but to include the others? how do you want to handle that?
Regarding the first direction (summary_all -> map). Should I use all three paper_id
columns?
For example, say our filters remove a row in summary_all
with paper_id_list1
= [93,95], paper_id_list2
= [70,88], and paper_id_list3
= [84,88].
In this case, do I remove all rows in map.data
where Paper_id
is 93, 95, 70, 88, 84, or 88?
Additionally, summary_all
has many duplicates in the paper_id_list#
columns. It looks like it would be possible for our filters to filter out one row with, for example, paper_id_list1
= 137, but still have other rows (not filtered out) that also have paper_id_list1
= 137.
In this case, do we still want our map to filter all observations with Paper_id
= 137?
@swood-ecology, That makes sense, but the code we currently rely on for the summary table is not really setup to recompute filtered data on the fly. We'd have to reorganize and write a code that has the capacity to filter based on user input.
@nathanhwangbo the three paper_id_lists
are nested within one another with the columns ending in '1' being the coarsest level, '2' the middle, and '3' the most specific. That means there can be multiple '2's for every '1', and multiple '3's for every '2'. In essence, it's a 1 to many relationship.
Currently, the select input boxes in the app draw only from the columns ending in '1' [i.e. Legend_1
, mean_per_change1
, sem_per_change1
]. For now it makes sense to use the list of papers included in the paper_id_list1
column only.
For your second question, I don't think we're quite at the point to make that decision. I thought our first goal was to get the mapped points and the reference list to match the data displayed in the figure. In my mind all we need to do this is grab the list of papers included in paper_id_list1
and filter the mapped points and reference list based on the paper_id
included in the list. Am I missing something?
Going in the direction (plot filter -> map filter) is definitely doable using the method you describe, but the only way I see to go in both directions (map filter -> plot filter as well) is to use the crosstalk
package I mentioned earlier, which requires us to actually merge the map.data
and summary_all
datasets into a single dataset. That's where things get a little messy, since it's hard to match the rows in one column with the rows in another when we don't have a column that's the same between them.
I see. We will have to figure out another format for the data set the app relies on. It's starting to look like the raw data, opposed to a summary, is what we'll need.
@LesleyAtwood do you want to make that decision now and figure it out, or hold off?
I would like to add the (plot filter -> map) capability right now and hold off on the (map -> plot filter).
We need to think through how best to format the data not only for the (map -> plot filter) but also other future filtering functions so that we don't run into this issue again and again.
ok. @nathanhwangbo do you want to work on filtering the map based on figure data? and I can remove the filtering capability on the map to not give the impression that a user should be able to filter by geography.
I just pushed changes to the app which should filter the map based on the plot filters.
I noticed that 7 rows in map.data
will never be plotted, since their Paper_id
does not match any in summary_all
. (These rows have Paper_id
= 122, 123, or 72).
@LesleyAtwood do you know what's up with these 7 rows?
This is correct. The comparisons made in those papers (and subsequently the data we extracted from those papers) do not conform to the comparisons we display in the figures.
@nathanhwangbo Thanks for adding the filtering. I'm loving it!
Use paper_list column in data set to filter mapped points.