Closed ute-hoffmann closed 7 months ago
yes that's very simple to do. can def be added as soon as I find time
If you're okay with it, I'd try to add it as soon as the issues with the plotting are resolved, I already have an idea.
@m-jahn - I tried to implement what I wanted/needed in the .Rmd script (see commit in fork), but did not manage to adjust the Nextflow pipeline/modules in a way that the output files are saved in the correct output directory. They are saved in the work directories, though. I am pretty sure that this thread gives the answer, but am not able to implement this.
I can have a look at that but I will anyway update the entire pipeline to the latest nextflow version and modules. So this might be incompatible with your local commit.
@ute-hoffmann I made a major update to the pipeline, now it runs all fitness analysis and R markdown notebooks in its own singularity container. Should solve all troubles with missing local R packages, wrong versions etc. I will look into this issue now.
Thanks, amazing! Adding another feature request immediately - including another heat map with clustered samples might also be of interest. This clustering helped me understand what was going on with one of my replicates (nothing good, unfortunately...). Used this code:
library(pheatmap)
p <- pheatmap(df_correlation, scale="row")
p
ggsave("correlation_samples_clustering.pdf", plot=p, width=35, height=35)
do you have an example? I don't know what df_correlation
is, and how this plot will look like.
otherwise I'm almost done exporting all plots as png and svg
Ah, sorry - used df_correlation as the data frame to collect all the correlations, maybe this code snippet helps to explain:
```{r, fig.width = 7.5, fig.height = 7, warning = FALSE}
df_correlation <- df_counts %>%
tidyr::pivot_wider(names_from = "sample", values_from = "n_reads") %>%
dplyr::select(-c(1:2)) %>%
cor()
write.csv(df_correlation, "correlation_samples.csv")
p_correlation <- df_correlation %>%
dplyr::as_tibble() %>%
dplyr::mutate(sample1 = colnames(.)) %>%
tidyr::pivot_longer(
cols = !sample1,
names_to = "sample2", values_to = "cor_coef"
) %>%
ggplot(aes(x = sample1, y = sample2, fill = cor_coef)) +
geom_tile() +
geom_text(color = grey(0.4), aes(label = round(cor_coef, 2))) +
custom_theme() +
labs(title = "", x = "", y = "") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
scale_fill_gradientn(
colours = c(custom_colors[1], grey(0.9), custom_colors[2]),
limits = c(-1, 1)
)
p_correlation
ggsave("correlation_samples.pdf", plot=p_correlation, width=35, height=35)
And that's the correlation plot with clustering. Haven't figured out yet how to make it easier to read, but that's prob too many samples correlation_samples_clustering.pdf
I see. I don't have pheatmap
in this container, so it wont be possible to plot the same heatmap as you did. But you can do this yourself once the table is exported, and this is what I will do.
Yup, also a solution :+1:
will push the changes tomorrow.
fixed with latest commit https://github.com/MPUSP/nf-core-crispriscreen/commit/5e3bb29e1f7e57048ba79bd7dd996cb7972c3a17
Will be merged in master with next version 1.2.0
@m-jahn - I tried to implement what I wanted/needed in the .Rmd script (see commit in fork), but did not manage to adjust the Nextflow pipeline/modules in a way that the output files are saved in the correct output directory. They are saved in the work directories, though. I am pretty sure that this thread gives the answer, but am not able to implement this.
BTW to implement this feature one needs to change not only the R script, but in addition to this the <module>.nf
file which defines the expected output files, and the modules.config
file which defnines where output files end up. Now all exported figures will get their own dir by type of extension (svg, pdf, png).
Description of feature
It would be amazing if count_summary.Rmd would export the created files (e.g. the correlation file or the PCA) as .pdf, so that they can be used to create Figures for a manuscript. Also, it would be nice to get the correlation matrix as separate .tsv/csv file. When a large data set with many different samples is analyzed, it is pretty difficult to read the depicted correlation plot & PCA in the created .html.