aryeelab / guideseq

Analysis pipeline for the GUIDE-seq assay.
GNU Affero General Public License v3.0
75 stars 53 forks source link

Too many files open during demultiplex #36

Open michael-weinstein opened 7 years ago

michael-weinstein commented 7 years ago

I am getting an error for having too many open files during demultiplex. It looks like the demultiplexer is making files for every barcode it sees with some frequency as opposed to using my list of barcodes and trying to either split them into known barcodes or mark them as unidentifiable.

Is there some preprocessing I should have done on the raw fastq file? Alternatively, I have some code of my own that might be able to help out with this problem by calling barcodes based on expected sequences.

Mike

martinaryee commented 7 years ago

You could try increasing the barcode frequency threshold that's used as a trigger to create a file. See line 47 of: https://github.com/aryeelab/umi/blob/3fef4c92becda4c2b4b6085555415f80c1dd858e/demultiplex.py (I can't remember off-hand if it's possible to set this from the command line)

michael-weinstein commented 7 years ago

I'll use that method. I was just wondering if there is a more optimal method for dealing with that issue (or you had an interest in dealing with it another way). I was just concerned, since it seems like a sub-optimal method to deal with this issue.

JudoWill commented 6 years ago

I'm running into this same issue. I've made a fix in my fork of the repo. Do you have a contributor policy?

Zethson commented 5 years ago

@JudoWill How did you fix the demultiplexing step?