caleblareau / bap

Bead-based single-cell atac processing
http://caleblareau.github.io/bap
MIT License
31 stars 8 forks source link

bap-barcode v2.1 generate truncated fasta file #28

Closed mailcolm9 closed 4 years ago

mailcolm9 commented 4 years ago

Hi,

I was using bap-barcode to handle biorad data from your Nature biotechnology paper. For most data, it generates correct fasta file that has the number of lines of generated fiiles are multiple of 4. However for data sets SRR8994134 and SRR8994139, bap-barcode generates a file that the number of lines is not the multiple of 4. Could you help me to see where might the problems come from? Thanks!

Zonghao

wc -l SRR8994139-c002_1.fastq.gz 539482 SRR8994139-c002_1.fastq.gz

wc -l SRR8994139-c002_2.fastq.gz 342438 SRR8994139-c002_2.fastq.gz

'wc -l SRR8994134-c007_1.fastq.gz 699103 SRR8994134-c007_1.fastq.gz'

'wc -l SRR8994134-c007_2.fastq.gz 466513 SRR8994134-c007_2.fastq.gz'

caleblareau commented 4 years ago

This has not been a known issue before. I’m not sure that you can run wc -l on a gzip compressed file...

https://superuser.com/questions/135329/count-lines-in-a-compressed-file https://superuser.com/questions/135329/count-lines-in-a-compressed-file

On Jun 10, 2020, at 10:31 PM, mailcolm9 notifications@github.com wrote:

Hi,

I was using bap-barcode to handle biorad data from your Nature biotechnology paper. For most data, it generates correct fasta file that has the number of lines of generated fiiles are multiple of 4. However for data sets SRR8994134 and SRR8994139, bap-barcode generates a file that the number of lines is not the multiple of 4. Could you help me to see where might the problems come from? Thanks!

Zonghao

wc -l SRR8994139-c002_1.fastq.gz 539482 SRR8994139-c002_1.fastq.gz

wc -l SRR8994139-c002_2.fastq.gz 342438 SRR8994139-c002_2.fastq.gz

'wc -l SRR8994134-c007_1.fastq.gz 699103 SRR8994134-c007_1.fastq.gz'

'wc -l SRR8994134-c007_2.fastq.gz 466513 SRR8994134-c007_2.fastq.gz'

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/caleblareau/bap/issues/28, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD32FYKTEW6QAHYEEKIG3ATRWA6WRANCNFSM4N27W7IA.

mailcolm9 commented 4 years ago

You are right. I should use zcat | wc -l and this command shows that these files are the multiples of 4. Also, for some reasons, this issue was gone by not nohup the command and it generates the corresponding bam file.