MPUSP / nf-core-crispriscreen

Process next generation sequencing data obtained from CRISPRi repression library screenings
MIT License
4 stars 2 forks source link

Exporting created images as .pdf, used data as .csv? #29

Closed ute-hoffmann closed 7 months ago

ute-hoffmann commented 7 months ago

Description of feature

It would be amazing if count_summary.Rmd would export the created files (e.g. the correlation file or the PCA) as .pdf, so that they can be used to create Figures for a manuscript. Also, it would be nice to get the correlation matrix as separate .tsv/csv file. When a large data set with many different samples is analyzed, it is pretty difficult to read the depicted correlation plot & PCA in the created .html.

m-jahn commented 7 months ago

yes that's very simple to do. can def be added as soon as I find time

ute-hoffmann commented 7 months ago

If you're okay with it, I'd try to add it as soon as the issues with the plotting are resolved, I already have an idea.

ute-hoffmann commented 7 months ago

@m-jahn - I tried to implement what I wanted/needed in the .Rmd script (see commit in fork), but did not manage to adjust the Nextflow pipeline/modules in a way that the output files are saved in the correct output directory. They are saved in the work directories, though. I am pretty sure that this thread gives the answer, but am not able to implement this.

m-jahn commented 7 months ago

I can have a look at that but I will anyway update the entire pipeline to the latest nextflow version and modules. So this might be incompatible with your local commit.

m-jahn commented 7 months ago

@ute-hoffmann I made a major update to the pipeline, now it runs all fitness analysis and R markdown notebooks in its own singularity container. Should solve all troubles with missing local R packages, wrong versions etc. I will look into this issue now.

ute-hoffmann commented 7 months ago

Thanks, amazing! Adding another feature request immediately - including another heat map with clustered samples might also be of interest. This clustering helped me understand what was going on with one of my replicates (nothing good, unfortunately...). Used this code:

library(pheatmap)
p <- pheatmap(df_correlation, scale="row")
p
ggsave("correlation_samples_clustering.pdf", plot=p, width=35, height=35)
m-jahn commented 7 months ago

do you have an example? I don't know what df_correlation is, and how this plot will look like.

m-jahn commented 7 months ago

otherwise I'm almost done exporting all plots as png and svg

ute-hoffmann commented 7 months ago

Ah, sorry - used df_correlation as the data frame to collect all the correlations, maybe this code snippet helps to explain:


```{r, fig.width = 7.5, fig.height = 7, warning = FALSE}
df_correlation <- df_counts %>%
    tidyr::pivot_wider(names_from = "sample", values_from = "n_reads") %>%
    dplyr::select(-c(1:2)) %>%
    cor()
write.csv(df_correlation, "correlation_samples.csv")
p_correlation <- df_correlation %>%
    dplyr::as_tibble() %>%
    dplyr::mutate(sample1 = colnames(.)) %>%
    tidyr::pivot_longer(
        cols = !sample1,
        names_to = "sample2", values_to = "cor_coef"
    ) %>%
    ggplot(aes(x = sample1, y = sample2, fill = cor_coef)) +
    geom_tile() +
    geom_text(color = grey(0.4), aes(label = round(cor_coef, 2))) +
    custom_theme() +
    labs(title = "", x = "", y = "") +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
    scale_fill_gradientn(
        colours = c(custom_colors[1], grey(0.9), custom_colors[2]),
        limits = c(-1, 1)
    )
p_correlation
ggsave("correlation_samples.pdf", plot=p_correlation, width=35, height=35)
ute-hoffmann commented 7 months ago

And that's the correlation plot with clustering. Haven't figured out yet how to make it easier to read, but that's prob too many samples correlation_samples_clustering.pdf

m-jahn commented 7 months ago

I see. I don't have pheatmap in this container, so it wont be possible to plot the same heatmap as you did. But you can do this yourself once the table is exported, and this is what I will do.

ute-hoffmann commented 7 months ago

Yup, also a solution :+1:

m-jahn commented 7 months ago

will push the changes tomorrow.

m-jahn commented 7 months ago

fixed with latest commit https://github.com/MPUSP/nf-core-crispriscreen/commit/5e3bb29e1f7e57048ba79bd7dd996cb7972c3a17

Will be merged in master with next version 1.2.0

m-jahn commented 7 months ago

@m-jahn - I tried to implement what I wanted/needed in the .Rmd script (see commit in fork), but did not manage to adjust the Nextflow pipeline/modules in a way that the output files are saved in the correct output directory. They are saved in the work directories, though. I am pretty sure that this thread gives the answer, but am not able to implement this.

BTW to implement this feature one needs to change not only the R script, but in addition to this the <module>.nf file which defines the expected output files, and the modules.config file which defnines where output files end up. Now all exported figures will get their own dir by type of extension (svg, pdf, png).