Hoohm / dropSeqPipe

A SingleCell RNASeq pre-processing snakemake workflow
Creative Commons Attribution Share Alike 4.0 International
147 stars 47 forks source link

Creating "violine", "count vs gene" and "count vs UMI" plots plus export seurat objects #39

Closed seb-mueller closed 6 years ago

seb-mueller commented 6 years ago

Hi Patrick,

I've updated code from another branch (seb_plots) into a new branch on my fork (seurat_violine_etc) in order to make a pull request. Not sure on how to go from here since you might not want to merge it into master (it didn't work to ask for another branch to merge into)?

This PR adds a few more diagnostics plots to assess mitochandrial/ribosomal etc fractions (see below).

It seems to run for my data, but was curious if you like it and works for you? As for earlier versions (seb_plots branch(, I've taken out a few plots to clean it up. It basically adds 3 plots as demonstrated below on the Macosko data

violinplots_comparison_umi

If it is too experimental, it is also conceivable to add another rule (so one can choose to not run it)

Hoohm commented 6 years ago

The new ggplot needs a fortran lib and it breaks dropseqpipe. I have to either update the packages to fit that or force the older ggplot2 version

seb-mueller commented 6 years ago

True, the same error has also crept up for me:

Error: package or namespace load failed for ‘plyr’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '~/mysamples/.snakemake/conda/2dc0b04c/lib/R/library/Rcpp/libs/Rcpp.so':
  libgfortran.so.4: cannot open shared object file: No such file or directory

At first I thought it is due to my setup, but I suspect the Rcpp (which is required by ggplot2) needs libgfortran version 4, but it's only available in version 4 which is not available on conda. Since ggplot2 has been used before, it should have broken anyway (without this PR)?

Hoohm commented 6 years ago

Yeah, fixing to 2.2.1 didn't fix the issue. Gonna try to fix rcpp

On Sat, Jul 14, 2018, 05:26 seb-mueller notifications@github.com wrote:

True, the same error has also crept up for me:

Error: package or namespace load failed for ‘plyr’ in dyn.load(file, DLLpath = DLLpath, ...): unable to load shared object '~/mysamples/.snakemake/conda/2dc0b04c/lib/R/library/Rcpp/libs/Rcpp.so': libgfortran.so.4: cannot open shared object file: No such file or directory

At first I thought it is due to my setup, but I suspect the Rcpp (which is required by ggplot2) needs libgfortran version 4, but it's only available in version 4 which is not available on conda https://anaconda.org/search?q=libgfortran. Since ggplot2 has been used before, it should have broken anyway (without this PR)?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/Hoohm/dropSeqPipe/pull/39#issuecomment-405011151, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNXaMS8doUjVll3pAqUaWePXAhtce1mks5uGbjpgaJpZM4VGtPK .

seb-mueller commented 6 years ago

Seems like your last fix has worked. I've adapted this PR to integrate the changes, maybe you can have another go? Thanks for your feedback :)

seb-mueller commented 6 years ago

Indeed, some of them can be removed, but wanted to have your feedback first on what is best generated (pdf, html or both) since most lines just need uncommenting to activate those filetype generation.

Are you happy with this on more general level? Since it requires more package dependencies (i.e. Seurat), it could also be implemented as a seperate rule, i.e. on top of the map rule adding a map_extended or plot rule or so which can than be used for more experimental stuff (i.e. tSNE plots, which I have the code for already) and won't break the flow since it can be opted out..

Hoohm commented 6 years ago

I think that for now, pdf should be used. Html files should be used for either interactive purposes or a markdown report.

Pdfs are nice and easy to import in illustrator if you want to modify the plot for a publication wheareas html won't be accessible in that way.

What do you think?

seb-mueller commented 6 years ago

Good point. I've adjusted the code accordingly, but left the html bits in as comments so they could be easily activated. Did you have a think about the map rule suggestion above?

Hoohm commented 6 years ago

Sorry, I didn't understand your question last time.

I thought you wanted to have a separate rule for those plots, which is already the case. You suggest a separate file for this step, right?

I think you're correct here. This would be cleaner and I expect we might want to do this maybe even after some filtering in the future. Definitely put it in another file.

Would not call it map_extended.smk though. Maybe summary.smk and we might actually add some of the ones in the extract_single.smk into it.

There is one thing that has been bothering me for a long time and I might just change this once we merge this.

More subfolders, separate samples completely in /data, splot logfiles into steps or softwares. Just some clearing up.

What do you think?