brianstock / MixSIAR

A framework for Bayesian mixing models in R:
http://brianstock.github.io/MixSIAR/
90 stars 75 forks source link

Loading Source Data by a Factor #305

Open linkx168 opened 2 years ago

linkx168 commented 2 years ago

Hello,

I am running into an error when trying to load source raw data in by a factor. I have a consumer (a fish) that has two factors. I have "lake" nested into "invasion". Thus, each lake is nested into the presence or absence of the invasive. We have collected our source data by lake. Invasion is either "ZM" or "uninvaded" and there are 13 lakes. There are two sources for every lake, either "Offshore" or "Littoral".

I am able to aggregate the source data and load them as "means" but would like to use the power of raw data. Here is the error code I am getting:

Error in list.sources.bylev[[lev]] <- SOURCE[SOURCE[, source_factors] == : attempt to select less than one element in integerOneIndex

I have included the mixture and source data/code.

Mixture data:

head(mix.file) Lake d13C d15N Invasion 1 Big Cormorant -22.07582 12.52028 ZM 2 Big Cormorant -21.74226 11.77720 ZM 3 Big Cormorant -22.14540 11.12436 ZM 4 Big Cormorant -21.32025 12.00592 ZM 5 Big Cormorant -21.95395 12.85126 ZM 6 Big Cormorant -21.94363 12.66749 ZM

Mixture code:

mix = load_mix_data(filename="MixRework.csv", #actual location of file iso_names=c("d13C","d15N"), factors=c("Invasion","Lake"), fac_random=c(TRUE,TRUE), fac_nested=c(FALSE,TRUE), cont_effects=NULL)

Source data:

head(source.filename) Sources Lake d13C d15N 1 Offshore Belle -24.43435 10.368648 2 Offshore Belle -23.87765 9.971567 3 Offshore Belle -23.41851 9.233677 4 Offshore Belle -23.67300 12.234493 5 Offshore Belle -23.86245 12.530859 6 Offshore Belle -23.67416 12.723043

Source code:

source = load_source_data(filename="SourceGroupedRaw.csv", source_factors="Lake", conc_dep=FALSE, data_type="raw", mix)

After researching the issue, I think the problem stems from the number of sources, but I can't quite figure it out. I also see a few other people have the same issue. Thanks for the help!

isotopedroughtplantnerd commented 2 years ago

I'm having this same issue. In case it's helpful, I've tried reducing my input data to only two sources grouped by a factor that only has two categories, and I still have the same problem. This makes me think it isn't related to the number of sources. I figured I wasn't formatting the source data file correctly. Has anyone seen instructions on how to format "raw" source data that is also grouped by a factor? I don't see this in any of the vignettes or other support files (sorry if I've overlooked something), so I've tried multiple formats trying to match with that of inputting "mean" source values with a factor, but I always get the same error message. I would love to see this question answered! Thanks so much for asking.

brianstock commented 2 years ago

Hi, have you looked at the manual, pages 62-63? Does the covariate column label in the source file match exactly that from the consumer file? https://github.com/brianstock/MixSIAR/blob/master/inst/mixsiar_manual_small.pdf.

isotopedroughtplantnerd commented 2 years ago

Hi Brian,

Thank you for taking the time to respond! I have looked at the manual, pages 62-63. There is no example for how to include a covariate while using raw source data. There is an example for the formatting of raw source data without a covariate (Lake example) and an example of using mean source values with a covariate (the Wolves example), but not an example with both raw data and a covariate. When I follow the Wolves formatting example to use mean source data with a covariate, everything runs fine. So, I tried to blend together the formatting examples for raw data without covariate and mean data with a covariate, which led me to format the source data (with a covariate) as follows- column 1: source, column 2: covariate, column 3: raw isotope values (see below). The column label for column 2 (the covariate) matches exactly the covariate column label in the consumer file. The column label for column 3 (isotope data) also matches exactly the isotope data column label in the consumer file. Please let me know if anything comes to mind as a possible solution or explanation for the error message. Thank you for the help.

I've included the first few rows of my consumer and source data frames below as well as the error message I get when I try to load the source data.

consumer (mixture) data frame:

mix[,c("species", "site", "O")] species site O 1 1 3 -2.36 2 1 1 -4.50 3 1 2 -5.79 4 1 3 -2.66 5 1 1 -3.69 6 1 2 -5.42 7 2 3 -6.39 8 2 1 -6.47 9 2 2 -6.34

Source data frame:

source.full source site O 1 deep well 1 -7.33 2 deep well 3 -7.72 3 deep well 2 -7.11 4 deep well 1 -7.50 5 deep well 3 -7.56 6 deep well 2 -7.14 7 deep well 1 -7.57 8 deep well 3 -7.91 9 deep well 2 -7.50

Load the source data

source_mod <- load_source_data(filename="source_mod.csv",

  • source_factors="site",#matches exactly consumer label
  • conc_dep=FALSE,
  • data_type="raw",
  • mix_mod)

Error in list.sources.bylev[[lev]] <- SOURCE[SOURCE[, source_factors] == : attempt to select less than one element in integerOneIndex

Kind regards, Jared

On Fri, Jun 24, 2022 at 11:04 AM Brian Stock @.***> wrote:

Hi, have you looked at the manual, pages 62-63? Does the covariate column label in the source file match exactly that from the consumer file? https://github.com/brianstock/MixSIAR/blob/master/inst/mixsiar_manual_small.pdf .

— Reply to this email directly, view it on GitHub https://github.com/brianstock/MixSIAR/issues/305#issuecomment-1165810423, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY3UOO4MSDZXDAERN7ROI7TVQX2BFANCNFSM5RRNJBCA . You are receiving this because you commented.Message ID: @.***>

--

Jared Williams

(714) 625-5119

Research Associate | Marine Science Institute

University of California Santa Barbara

PhD Candidate | Water and Wetland Resources

SUNY College of Environmental Science & Forestry

@.***

linkx168 commented 2 years ago

Hi Jared, Thank you for writing. It seems like we are both experiencing very similar (if not the same) issue. I have also seen the issue mentioned in this thread.

I am curious in how you proceeded with your modeling without being able to use the raw data by a covariate. For now, I have used the aggregation method. I, however, am running into a place where loading raw data into the model would be a clear advantage in source separation.

Thanks again for your explanation and I look forward to a solution!

isotopedroughtplantnerd commented 2 years ago

Hi linkx168,

Thank you for letting me know you're running into the same problem. I have not found a solution for this issue and have similarly been forced to use mean/sd data as a substitute for raw data, even though the raw data would be more ideal for my application. I'm hoping this issue will be resolved at some point soon, especially considering we have a manuscript to submit in the next couple months. Please let me know if you find a solution.

Kind regards, Jared

On Thu, Aug 18, 2022 at 10:22 AM linkx168 @.***> wrote:

Hi Jared, Thank you for writing. It seems like we are both experiencing very similar (if not the same) issue. I have also seen the issue mentioned in this thread https://github.com/brianstock/MixSIAR/issues/150.

I am curious in how you proceeded with your modeling without being able to use the raw data by a covariate. For now, I have used the aggregation method. I, however, am running into a place where loading raw data into the model would be a clear advantage in source separation.

Thanks again for your explanation and I look forward to a solution!

— Reply to this email directly, view it on GitHub https://github.com/brianstock/MixSIAR/issues/305#issuecomment-1219743806, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY3UOO4TUVG53Y6TNWV6R4LVZZWMBANCNFSM5RRNJBCA . You are receiving this because you commented.Message ID: @.***>

--

Jared Williams

(714) 625-5119

Research Associate | Marine Science Institute

University of California Santa Barbara

PhD Candidate | Water and Wetland Resources

SUNY College of Environmental Science & Forestry

@.***

CassiarCaribou commented 1 year ago

Hey folks,

I am having the same issue when trying to use my raw data. Such a shame. I worked so hard to acquire those data, and I want to benefit from the power of using raw data. My models work with mean source data but....I really want to use my raw source data.

Has anyone come up with a solution? I have attempted to contact Brian Stock but I have not heard from him.

-- -Oliver

AndrewLJackson commented 1 year ago

hi. I haven't checked, but my guess is for the nested factor design that you can run it with interactions between fixed factors like that. Instead, I suspect you will need to create a new column that has a unique value for e.g. Big_Cormorant, Belle_Cormorant etc.... My thinking is that consumer data is expecting there to be n*m sources but is only finding n (where n is the number of lakes and m the number of invasions).

Im not sure if that's the same set up as for the linear covariate issue or not.

isotopedroughtplantnerd commented 1 year ago

Hi everyone,

I have not heard anything from Brian, but as a collaborator, I'm hoping Andrew might be able to help. My most basic question is how to structure my source data frame when I want to use both raw source data and a covariate (without a nested factor)? As I mentioned in a previous comment, the vignettes provide an example for the formatting of raw source data without a covariate (Lake example) and an example of using mean source values with a covariate (the Wolves example), but not an example with both raw data and a covariate." Thank you for the help!

CassiarCaribou commented 1 year ago

Jared, I am having the exact same issue as you. Let me know if you get in touch with Andrew and/or Brian.

AndrewLJackson commented 1 year ago

I don't know immediately and to be honest I don't have time to investigate this in detail. My suggestion is to revert to using mean and sd which is not a major loss of information since ultimately the model is estimating a mean and sd from the data. Also have you tried my suggestion above to create a categorical factor that is unique for each combination when you have two interacting / nested factors?

The manual also has the Storm Petrel example which features 1 fixed effect (Region) and raw source data. The manual also includes more details on file formatting in general. https://github.com/brianstock/MixSIAR/blob/master/inst/mixsiar_manual_small.pdf

isotopedroughtplantnerd commented 1 year ago

Hi Andrew,

Thank you for taking the time to respond. I understand you may not have time to dig into things further. It is good to know that you don't see much of a loss in using mean/sd as opposed to raw data, since this is what I have suggested to collaborators.

I still wanted to follow up regarding your suggestion of creating a categorical factor that is unique for each combination of factors, since I ran into an issue and want to make sure I understand your suggestion. I need my Mix data categorized by Species and Season (such as in the Wolves example where Mix data is categorized by Pack and Region, though my factors are not nested). My Source data needs to be categorized by Season, so that Source data collected in spring is not compared with Mix data collected in Summer. However, when creating a new categorical factor for the Source data that is unique for each combination of factors (i.e., combining Source and Season columns into a single variable: Source.x.Season), the Mix data can no longer be categorized by Season since this column name no longer exists in the Source data frame. Please let me know if I misunderstood your suggestion. Thank you!