Merge multiple KEMET results

mattoslmp commented 1 year ago

Dear, I performed kemet against several samples, can you give me some tips on how to merge these tables into one? Best regards, Leandro.

Matteopaluh commented 1 year ago

Dear Leandro, Thanks for using our tool. Regarding how to summarize KEMET output into a single table, it would depend how you'd like to have them summarized (i.e. the specific format).

I'd personally do that using a combination of bash commands to extract the columns of interest from the .tsv table files. For example I quickly tried these commands:

# move to the KEMET report folder
cd KEMET/reports_tsv

# create first column of summary file
echo samples > modules.start

# add modules ID in summary file
# replace [NAME] w/ any single .tsv filename
cut -f1 [NAME] >> modules.start

# extract module compleness per each genome as a tmp file
for f in *.tsv; do echo ${f:10:-4} > $f.tmp; cut -f3 $f >> $f.tmp; done

# create new folder for result
mkdir summary
# unite modules ID and result per each genome
paste modules.start *.tmp > summary/summarized_table.tsv

# clean from tmp files
rm *.tmp modules.start

Do you have anything specific in mind?

Best, Matteo

mattoslmp commented 1 year ago

Dear Matteo, thank you for your attention and help, your script worked perfectly. It was exactly what I needed.

I ended up (parser) doing something similar in R, I'll post it below in case anyone needs a second solution:

rm(list=ls()) library (purrr) library(readr) library(ggpubr) library(stringr)

setwd ("D:/ITV/KEMET_resultados/reports_tsv_KASS")

path: To specify directory contain KEMET results: data_join <-list.files(path="D:/ITV/KEMET_resultados/reports_tsv_KASS/", pattern="*.tsv", full.names=TRUE) %>% lapply(read_tsv) %>%
reduce(full_join, by = "Module_id") %>% unique()

modules_id <- data_join$Module_id # colname: module_id modules_names <- data_join$Module_name.x # colname: module_name df <- data_join %>% select(matches("(Completeness)"))

My filenames pattern of KEMET results: reportKMC_Ga0541012_bin.tsv myfilenames <-list.files(path="D:/ITV/KEMET_resultados/reports_tsv_KASS/", pattern="*.tsv", full.names=TRUE) namefiles <- sapply(strsplit(myfilenames, split='reportKMC', fixed=TRUE), function(x) (x[2])) name_files <- str_remove(name_files, pattern = ".tsv") df2 <- data.frame(modules_id, modules_names, df) colnames(df2) <- c ("Module_id", "Completeness", name_files) write.table (df2, "Res_KEMET.tsv")

Best regards, Leandro.

Matteopaluh / KEMET

Merge multiple KEMET results #11