cozygene / FEAST

Fast expectation maximization for microbial source tracking
Other
115 stars 60 forks source link

Could not find function "Infer.SourceContribution" #45

Closed kbazany closed 1 year ago

kbazany commented 1 year ago

Hi there,

I am trying to run FEAST for the first time and I got this error message:

Error in Infer.SourceContribution(source = sources, sinks = t(sinks), : could not find function "Infer.SourceContribution"

How do I go about getting this function? Thank you so much!

liashenhav commented 1 year ago

Can you share the code you're running?

kbazany commented 1 year ago

Thank you for getting back to me so quickly! I tried with some of my own data and with the demo data and got the same error message.

Thank you!


From: liashenhav @.> Sent: Thursday, November 3, 2022 12:37 PM To: cozygene/FEAST @.> Cc: Bazany,Kate @.>; Author @.> Subject: Re: [cozygene/FEAST] Could not find function "Infer.SourceContribution" (Issue #45)

Caution: EXTERNAL Sender

Can you share the code you're running?

— Reply to this email directly, view it on GitHubhttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcozygene%2FFEAST%2Fissues%2F45%23issuecomment-1302521126&data=05%7C01%7CKate.Bazany%40colostate.edu%7Ce8b33a35164d4ee7257308dabdca71b7%7Cafb58802ff7a4bb1ab21367ff2ecfc8b%7C0%7C0%7C638030974430181097%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=wDMBZ%2F2pSJBhm7K%2FoL%2BZyCiruJPin%2BSk4Ss6AXKJ36g%3D&reserved=0, or unsubscribehttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAVQ5H6VT5S64RO5Z4UEXMELWGQA6DANCNFSM6AAAAAARWNZMBY&data=05%7C01%7CKate.Bazany%40colostate.edu%7Ce8b33a35164d4ee7257308dabdca71b7%7Cafb58802ff7a4bb1ab21367ff2ecfc8b%7C0%7C0%7C638030974430181097%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=VsfMbfCWJgmIJcKfE0InImk5LHdRPpjDaj70fsCH1kw%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>

liashenhav commented 1 year ago

Happy to help, but can only do that using the code you're running. Please paste it here so we could try to recapitulate your error.

Thanks!

kbazany commented 1 year ago

Sorry about that! It didn't attach!

FEAST Source Tracker Analysis on 16S amplicon data

for sugarbeet drought project

Installing Packages

Packages <- c("Rcpp", "RcppArmadillo", "vegan", "dplyr", "reshape2", "gridExtra", "ggplot2", "ggthemes") install.packages(Packages) lapply(Packages, library, character.only = TRUE)

Loading packages

library(Rcpp) library(RcppArmadillo) library(vegan) library(dplyr) library(reshape2) library(gridExtra) library(ggplot2) library(ggthemes)

library(tidyverse) library(mctoolsr) library(cozygene) library(FEAST)

Installing FEAST

devtools::install_github("cozygene/FEAST")

library(FEAST)

Preparing Data

otutabtax <- read_delim("~/Desktop/R/MiCROPe/zotutab_16S_merged.txt", delim = "\t")

remove host reads (consider sample type and amplicon)

otutabtax$host <- 0 otuhost <- otutabtax %>% mutate(host = case_when( grepl("Rickettsiales", taxonomy) ~ "remove", grepl("Chloroplast", taxonomy) ~ "remove", TRUE ~ "keep")) otufilt <- filter(.data = otuhost, host == "keep")

write

mcotu2 <- otufilt %>% select(-host)

write files

write_delim(mcotu2, file = "mcotu2.txt", delim = "\t")

Input Data

input <- load_taxa_table("mcotu2.txt", "~/Desktop/R/MiCROPe/16S_Meta.txt")

rarefy

input_rar = single_rarefy(input, 5000)

filter out by treatment

input_rar <- filter_data(input_rar, "Treatment", filter_vals = "Drought") input_rar <- filter_data(input_rar, "Species", filter_vals = "Sugarbeet")

o16S_meta <- input_rar$map_loaded o16S_otu <- input_rar$data_loaded

put metadata and otu table in FEAST format

dim(o16S_meta) dim(o16S_otu)

otu_feast <- o16S_otu %>% rownames_to_column()

meta_feast <- o16S_meta %>% rownames_to_column(var = "SampleID") %>% mutate(SourceSink = case_when( endsWith(Sample, "Roots") ~ "Sink", endsWith(Sample, "Leaves") ~ "Sink", endsWith(Sample, "Soil") ~ "Source", endsWith(Sample, "Bulk") ~ "Source")) %>% mutate(id = case_when( endsWith(Site, "CO1") ~ 1, endsWith(Site, "CO3") ~ 2, endsWith(Site, "CO8") ~ 3, endsWith(Site, "CO10") ~ 4, endsWith(Site, "NE4") ~ 5, endsWith(Site, "NE7") ~ 6, endsWith(Site, "MTA") ~ 7, endsWith(Site, "MTB") ~ 8)) %>% mutate(Env = paste(Sample, id)) %>% select(SampleID, Env, SourceSink, id)

Note -- there may be an issue as it seems that there can only be one sink

per id. This isn't possible because I don't have enough bulk samples

I may need to do plant by plant

write files

write_delim(meta_feast, file =
"~/Desktop/R/MiCROPe/feast_results/meta_feast_16S_irrigated_corn_bulksoil_rootsleaves.txt", delim = "\t") write_delim(otu_feast, file = "~/Desktop/R/MiCROPe/feast_results/otu_feast_16S_irrigated_corn_bulksoil_rootsleaves.txt", delim = "\t")

go back in excel and delete the col heading for the OTUs

Read files

metadata <- Load_metadata(metadata_path = "~/Desktop/R/MiCROPe/feast_results/meta_feast_16S_irrigated_corn_bulksoil_rootsleaves.txt") otus <- Load_CountMatrix(CountMatrix_path = "~/Desktop/R/MiCROPe/feast_results/otu_feast_16S_irrigated_corn_bulksoil_rootsleaves.txt")

dir_path <- "~/Desktop/R/MiCROPe/feast_results/" outfile <- "16S_irrigated_corn_bulksoil_rootsleaves"

Run

FEAST_output <- FEAST(C = otus, metadata = metadata, different_sources_flag = 1, dir_path = dir_path, outfile = outfile)

FEAST <- function(C, metadata, EM_iterations = 1000, COVERAGE = NULL ,different_sources_flag, dir_path, outfile){

1. Parse metadata and check it has the correct hearer (i.e., Env, SourceSink, id)

if(sum(colnames(metadata)=='Env')==0) stop("The metadata file must contain an 'Env' column naming the source environment for each sample.") if(sum(colnames(metadata)=='SourceSink')==0) stop("The metadata file must contain a 'SourceSink' column indicating 'source' or 'sink' for each sample.") if(sum(colnames(metadata)=='id')==0) stop("The metadata file must contain an 'id' column matching the source environments for each sink sample.") sink_ids <- grep("sink",as.character(metadata$SourceSink),ignore.case=TRUE,value=FALSE) source_ids <- grep("source",as.character(metadata$SourceSink),ignore.case=TRUE,value=FALSE) metadata$SourceSink[sink_ids] = "Sink" metadata$SourceSink[source_ids] = "Source"

2. Enforce integer data

if(sum(as.integer(C) != as.numeric(C)) > 0){ stop('Data must be integral. Consider using "ceiling(datatable)" or ceiling(1000*datatable) to convert floating-point data to integers.') }

3. Extract only those samples in common between the two tables

common.sample.ids <- intersect(rownames(metadata), rownames(C)) C <- C[common.sample.ids,] metadata <- metadata[common.sample.ids,]

Double-check that the metadata file and otu table

had overlapping samples

if(length(common.sample.ids) <= 1) { message <- paste(sprintf('Error: there are %d sample ids in common '), 'between the metadata file and data table') stop(message) }

4. Extract number of sources and sinks from metadata

all_sinks_sampleID <- unique(rownames(metadata)[which(metadata$SourceSink == 'Sink')]) all_sources_sampleID <- unique(rownames(metadata)[which(metadata$SourceSink == 'Source')]) num_sinks <- length(all_sinks_sampleID) num_sources <- length(all_sources_sampleID) Ids <- na.omit(unique(metadata$id))

5. Extract environment information from metadata

envs <- paste0(metadata$Env, "_", rownames(metadata)) envs_sources <- paste0(all_sourcessampleID, "", metadata$Env[rownames(metadata) %in% all_sources_sampleID]) envs_sink <- paste0(all_sinkssampleID, "", metadata$Env[rownames(metadata) %in% all_sinks_sampleID])

proportions_mat <- matrix(NA, ncol = (num_sources+1), nrow = num_sinks) idx.unknown <- num_sources+1

Quantify the sources' contribution for each sink sample separately

for(it in 1:num_sinks){

if(it%%10==0 || it == num_sinks)
  print(paste0("Calculating mixinig proportions for sink ", it))

###6. Assign ids to sinks and their corresponding sources - for every sink (test.id) match its sources (train.ix)
if(different_sources_flag == 1){
  train.ix <- which(metadata$SourceSink=='Source' & metadata$id == Ids[it])
  test.ix <- which(metadata$SourceSink=='Sink' & metadata$id == Ids[it])
} else {
  train.ix <- which(metadata$SourceSink=='Source')
  test.ix <- which(metadata$SourceSink=='Sink' & metadata$id == Ids[it])
}

###7. Set the number of sources per sink
num_sources <- length(train.ix)

###8. Set defualt COVERAGE
if(is.null(COVERAGE))
  COVERAGE <-  min(rowSums(C[c(train.ix, test.ix),]))

if(COVERAGE > 0){

  ###9. Define sources and sinks
  if(length(train.ix) == 1)
    sources <- as.matrix(FEAST_rarefy(t(as.matrix(C[train.ix,])), COVERAGE))

  if(length(train.ix) > 1)
    sources <- as.matrix(FEAST_rarefy(C[train.ix,], COVERAGE))

  sinks <- as.matrix(FEAST_rarefy(t(as.matrix(C[test.ix,])), COVERAGE))

  ###10. Estimate source proportions for each sink
  FEAST_output<-Infer.SourceContribution(source=sources, sinks = t(sinks), env = envs[train.ix], em_itr = EM_iterations, COVERAGE = COVERAGE)

  idx.sources <- which(all_sources_sampleID %in% rownames(metadata)[train.ix])
  proportions_mat[it,c(idx.sources, idx.unknown)] <- FEAST_output$data_prop[,1]

} else{

  cat(sprintf('Error: the sequencing depth in one sink or source sample in Id %d is zero. No proportions were generated\n', it))

}

}

proportions_mat <- data.frame(proportions_mat) colnames(proportions_mat) <- c(envs_sources, "Unknown") rownames(proportions_mat) <- envs_sink

proportions_mat[is.na(proportions_mat)] <- 999

setwd(dir_path) write.table(proportions_mat, file = paste0(outfile,"_source_contributions_matrix.txt"), sep = "\t") return(list = c(proportions_mat, FEAST_output))

}

Running with my data

FEAST_output <- FEAST(C = otus, metadata = metadata, different_sources_flag = 1, dir_path = dir_path, outfile = outfile)

Trying with Demo Data

metadata <- Load_metadata(metadata_path = "https://raw.githubusercontent.com/cozygene/FEAST/master/Data_files/metadata_example_multi.txt") otus <- Load_CountMatrix(CountMatrix_path = "https://raw.githubusercontent.com/cozygene/FEAST/master/Data_files/otu_example_multi.txt")

FEAST_output <- FEAST(C = otus, metadata = metadata, different_sources_flag = 1, dir_path = "~/FEAST/Data_files/", outfile="demo")

PlotSourceContribution(SinkNames = rownames(FEAST_output)[c(5:8)], SourceNames = colnames(FEAST_output), dir_path = "~/FEAST/Data_files/", mixing_proportions = FEAST_output, Plottitle = "Test",Same_sources_flag = 0, N = 4)

kbazany commented 1 year ago

Hi, Sorry about that but I think I found it! I reran all of the function codes and now the demo data seems to be running. Thank you!

liashenhav commented 1 year ago

It looks like you're not running the FEAST function and it's parameters. You are running the code the composes this function (as you pasted above, you divide it to 8 sections)... That's the reason you're getting this error.... did you try running FEAST(C = otus, metadata = metadata, different_sources_flag = 1, dir_path = "~/FEAST/Data_files/", outfile="demo")?