Closed mattoslmp closed 1 year ago
Dear Leandro, Thanks for using our tool. Regarding how to summarize KEMET output into a single table, it would depend how you'd like to have them summarized (i.e. the specific format).
I'd personally do that using a combination of bash commands to extract the columns of interest from the .tsv table files. For example I quickly tried these commands:
# move to the KEMET report folder
cd KEMET/reports_tsv
# create first column of summary file
echo samples > modules.start
# add modules ID in summary file
# replace [NAME] w/ any single .tsv filename
cut -f1 [NAME] >> modules.start
# extract module compleness per each genome as a tmp file
for f in *.tsv; do echo ${f:10:-4} > $f.tmp; cut -f3 $f >> $f.tmp; done
# create new folder for result
mkdir summary
# unite modules ID and result per each genome
paste modules.start *.tmp > summary/summarized_table.tsv
# clean from tmp files
rm *.tmp modules.start
Do you have anything specific in mind?
Best, Matteo
Dear Matteo, thank you for your attention and help, your script worked perfectly. It was exactly what I needed.
I ended up (parser) doing something similar in R, I'll post it below in case anyone needs a second solution:
rm(list=ls()) library (purrr) library(readr) library(ggpubr) library(stringr)
setwd ("D:/ITV/KEMET_resultados/reports_tsv_KASS")
path: To specify directory contain KEMET results:
data_join <-list.files(path="D:/ITV/KEMET_resultados/reports_tsv_KASS/", pattern="*.tsv", full.names=TRUE) %>%
lapply(read_tsv) %>%
reduce(full_join, by = "Module_id") %>% unique()
modules_id <- data_join$Module_id # colname: module_id modules_names <- data_join$Module_name.x # colname: module_name df <- data_join %>% select(matches("(Completeness)"))
My filenames pattern of KEMET results: reportKMC_Ga0541012_bin.tsv myfilenames <-list.files(path="D:/ITV/KEMET_resultados/reports_tsv_KASS/", pattern="*.tsv", full.names=TRUE) namefiles <- sapply(strsplit(myfilenames, split='reportKMC', fixed=TRUE), function(x) (x[2])) name_files <- str_remove(name_files, pattern = ".tsv") df2 <- data.frame(modules_id, modules_names, df) colnames(df2) <- c ("Module_id", "Completeness", name_files) write.table (df2, "Res_KEMET.tsv")
Best regards, Leandro.
Dear, I performed kemet against several samples, can you give me some tips on how to merge these tables into one? Best regards, Leandro.