Hoohm / dropSeqPipe

A SingleCell RNASeq pre-processing snakemake workflow
Creative Commons Attribution Share Alike 4.0 International
147 stars 47 forks source link

Add new parameters, fix plotting #13

Closed martinholub closed 6 years ago

martinholub commented 6 years ago

Hello @Hoohm,

I have used the dropSeqPipe for DroNc-seq data analysis and found it a great help. I like that it chains the commands into reproducible workflow. Thumbs up!

I made small adjustments to the pipeline, mainly because I needed to access additional parameters other than the ones you decided to expose in the configuration file. I create a pull request as you may benefit from looking at the changes and considering if other users would need them as well.

I also had to make small changes to knee_plot.R and disable BCDrop plot as these were not running out of the box. I would further suggest that add a note to WIKI regarding the fact that one needs to generate index with the same --sjdbOverhang as is used later for alignment.

Hoohm commented 6 years ago

Hello @martinholub I'm glad the pipeline is useful for you 👍

I'm currently working on a new release. I'm getting rid of the python package style and coming back to snakemake workflows. This would also enable an easier access to parameters that are not exposed right now. I hope to release it in a week or two.

I've never been familiar with java options and I see you added XX:ParallelGCThreads={CORES}. How would this work on local computers as opposed to clusters? Does it speed up the processes? I always thought the bottleneck here was i/O access.

Could you tell me what wasn't working out of the box for BCDrop? I really like the plot and I would like to keep it.

The STAR index generation will now be included in the pipeline as a prerequisite (as would any files necessary for the mapping steps). This will make it simpler to use.

Wish you a happy new year

martinholub commented 6 years ago

Hello @Hoohm ,

excuse my delayed response. Regarding your questions:

martinholub commented 6 years ago

Hello @Hoohm , here is the error message I get:

Error in x[[jj]][iseq] <- vjj : replacement has length zero
Calls: plotBCDrop -> [<- -> [<-.data.frame
Hoohm commented 6 years ago

Is it possible that you have some empty logfiles from CELL_barcode.txt and UMI_barcode.txt? I've run into this issue when the filters are too stringent and the plotting will fail because of no lines to read from.

martinholub commented 6 years ago

None of these files are empty (see attached). barcodes.zip

Hoohm commented 6 years ago

I might have the issue. Did you run the fastqc step?

martinholub commented 6 years ago

you are correct, thanks for the tip.

Regarding the fastqc - I have run it only on couple of files, not for all that I have used in the downstream pipeline. I will rerun fastqc and generate-plots to see if it solves the issue.

martinholub commented 6 years ago

update: generate-plots still errors the same way, even if it has output of fastqc available.

Hoohm commented 6 years ago

Edit: I think it won't work if you run only part of the fastqc. This is fixed in the new BCdrop because it doesn't rely on the summary fastqc.txt file anymore.

On Sat, Jan 13, 2018, 21:24 Martin Holub notifications@github.com wrote:

update: generate-plots still errors the same way, even if it has output of fastqc available.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Hoohm/dropSeqPipe/pull/13#issuecomment-357464694, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNXaJXOTds7RKS0NA28UloG0kDAp-Phks5tKRCmgaJpZM4RP3vk .

Hoohm commented 6 years ago

Hello @martinholub, this should be fixed in the new version. I'm not keen on exposing this option since I'm not sure how it would be useful in this context. Could you tell me how you use it and for what purpose. Maybe we can even improve the pipeline on this.