Closed methornton closed 1 year ago
Hello @methornton and thank you very much for your effort. Could you add this:
dat_path <- file.path(my_basename, str_glue("DEA_{my_exp}_{genome_input}_{todate}"),
my_tools) %>% set_names(my_tools)
and rerun it? It seems that the vectors of dat_path don't have the proper names.
Hello! @ConYel ! So this addition helped, I have four conditions in cond_mat, the program will make a histogram for the first one, but then hangs and doesn't error out. A venn diagram is made for only the first condition in the root DGA folder, venn_diagram_DE_salmon_fC_edgeR_CtrvAce.pdf, for my salmon filter there only remained 204 scnRNA and in the venn diagram there are no significant genes, which lists 204 up 204 dn, there may be some for the other conditions. When I check the DE tsv for that condition for salmon and fc, there are indeed no significant genes. I cam remove the condition, but it would be nice to be able to keep it so i can show the PI that there were no significant sncRNA.
the script processes to generate these files, for featureCounts a well, and then hangs
DE_salmon_edgR_TMM_AceTrnvAceExp.txt DE_salmon_vm_QW_Q_CtrvAce.txt
DE_salmon_edgR_TMM_CtrvAce.txt DE_salmon_vm_QW_Q_CtrvLyral.txt
DE_salmon_edgR_TMM_CtrvLyral.txt DE_salmon_vm_QW_Q_LyralTrnvLyralExp.txt
DE_salmon_edgR_TMM_LyralTrnvLyralExp.txt hist_edger_p_value_CtrvAce.pdf
DE_salmon_vm_QW_Q_AceTrnvAceExp.txt hist_limma_p_value_CtrvAce.pdf
venn_diagram_DE_salmon_fC_edgeR_CtrvAce.pdf DE_fc_edgR_TMMCtrvAce.txt DE_salmon_edgR_TMM_CtrvAce.txt
Hello @methornton, the code for the venn and the histogram were made for only one condition at that time, just for exploration. To make for each condition we need to change the function a bit. I will check it with some toy data and update you. Regarding the resulted txt files. You can use the significant from there to perform any kind of downstream analysis you prefer. For mirna targets you can use some online tool.
Hello! I removed the condition that had zero significant genes and the script still hangs at the same spot. Perhaps I made a mistake in transcription?
# design ----
colnames(design) # check the names and make the contrasts
con_mat <- makeContrasts(AceTrnvAceExp = EXP.Ace - TRN.Ace, LyralTrnvLyralExp = EXP.Lyral - TRN.Lyral, CtrvLyral = ((TRN.Lyral + EXP.Lyral)/2) - CTR.Control, levels = design)
## salmon ----
salmon_edgR_TMM <- estimateDisp(salmon_edgR_TMM, design = design, robust=TRUE)
salmon_edgR_TMM <- glmQLFit(salmon_edgR_TMM, design, robust = TRUE)
DE_salmon_edgR <- con_mat %>% colnames() %>% set_names() %>%
map(~glmQLFTest(salmon_edgR_TMM, contrast = con_mat[,.x]) %>%
topTags(n = nrow(.), adjust.method = "BH", sort.by = "PValue", p.value = 1) %>%
.$table %>%
as_tibble(rownames = "smallRNA") %>%
write_tsv(file.path(dat_path[['salmon']], str_c("DE_salmon_edgR_TMM_", .x, ".txt")))
)
pdf(file.path(dat_path[['salmon']],str_c("hist_edger_p_value_", names(DE_salmon_edgR[1]),".pdf") ))
hist(DE_salmon_edgR[[1]]$PValue, breaks = 0:20/20,col = "grey50", border = "white")
dev.off()
salmon_edgeR_TMM_p <- DE_salmon_edgR[[1]] %>%
mutate(salmon_edgeR = if_else(
FDR >= 0.05, 0, if_else(
logFC > 0, 1, -1
)
)) %>%
select(smallRNA , salmon_edgeR)
So something that is interesting is that for the salmon there is inserted a '_' between TMM and the condmat condition. this is not so with featureCounts. In the code above, i would seem like there would not be a '' inserted.
DE_fc_edgR_TMMAceTrnvAceExp.txt DE_fc_vm_QW_Q_CtrvLyral.txt
DE_fc_edgR_TMMCtrvLyral.txt DE_fc_vm_QW_Q_LyralTrnvLyralExp.txt
DE_fc_edgR_TMMLyralTrnvLyralExp.txt hist_edger_p_value_AceTrnvAceExp.pdf
DE_fc_vm_QW_Q_AceTrnvAceExp.txt hist_limma_p_value_AceTrnvAceExp.pdf
Sure. Ok then thanks. So, I set the script to one condition at a time. I can run the script for each condition, no problem. I really appreciate your help. I can get what I need from those spreadsheets. The annotation forging was very important. it would be nice to get all the graphs. it is a lot of work.
Hello @methornton , could you let me know where the script hangs? Is it the issue on the histogram only or before?
Hello @methornton ,have you solved the issues?
Hi @ConYel I can get most of the EDA script to work. I am getting an intermittent error, probably with pkgconfig. functionally, I can get the count data out and then process it normally. So there are no deal breakers. Thank you!! It would be nice to be able to use contrast matrices for multiple comparisons, practically that will improve adoption.
Hello @methornton, yes it should have been changed to use in one run all the comparisons. I had some issues on how to implement it with the Venn. because it's time the names of the comparisons change so I didn't dive into it much. Mostly I was focusing on getting the DE from all comparisons and the create any other plots I need individually.
Hello!
Even though I am halted on the heatmap function still, I had the normalization files "list_norm_dgls" and I was able to make a "piRNA_predicted_Targets_rheMac10.txt". I'm partial to the voomQW_Quantile normalization. I figured I would plunge ahead with the DGA analysis. I am used to using edgeR and making the contrast matrix. I have five groups and five comparisons. When I put that information in it seems to work until the "map" step, where it gives the subscript out of bounds error.
Here is the top half of my DGA script which is mostly identical to the TCGA_BRCA script.
Here is the error it also occurs with featureCounts data
Getting closer! I really appreciate your help. Thank you!