RobertsLab / resources

https://robertslab.github.io/resources/
19 stars 11 forks source link

Get read counts for Caligus fq files #820

Closed sr320 closed 4 years ago

sr320 commented 4 years ago

The files

Sealice_F1_S20_L001_R1_001.fastq.gz
Sealice_F1_S20_L001_R2_001.fastq.gz
Sealice_F1_S20_L002_R1_001.fastq.gz
Sealice_F1_S20_L002_R2_001.fastq.gz
Sealice_F2_S22_L001_R1_001.fastq.gz
Sealice_F2_S22_L001_R2_001.fastq.gz
Sealice_F2_S22_L002_R1_001.fastq.gz
Sealice_F2_S22_L002_R2_001.fastq.gz

and are at..

/Volumes/web/nightingales/C_rogercresseyi/

issue is related to consistent error seen with raw and trimmed files...

Created C -> T converted version of the FastQ file Sealice_F1_S20_L001_R1_001.fastq.gz (20565855 sequences in total)

Writing a G -> A converted version of the input file Sealice_F1_S20_L001_R2_001.fastq.gz to Sealice_F1_S20_L001_R2_001.fastq.gz_G_to_A.fastq

Created G -> A converted version of the FastQ file Sealice_F1_S20_L001_R2_001.fastq.gz (113474961 sequences in total)

[FATAL ERROR]:  Number of bisulfite transformed reads are not equal between Read 1 (#20565855) and Read 2 (#113474961).
Possible causes: file truncation, or as a result of specifying read pairs that do not belong to each other?! Please re-specify file names! Exiting...
kubu4 commented 4 years ago

I'm determining counts now, but based on file sizes(the file sizes are noticeably different between each of the R1/R2 FastQ files), it's not terribly surprising that they don't have the same number of R1/R2 reads.

sr320 commented 4 years ago

Thanks! makes sense. Probably should confirm files match what was provided by core. On Jan 2, 2020, 12:42 PM -0800, kubu4 notifications@github.com, wrote:

I'm determining counts now, but based on file sizes(the file sizes are noticeably different between each of the R1/R2 FastQ files), it's not terribly surprising that they don't have the same number of R1/R2 reads. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

kubu4 commented 4 years ago

Hmmm, well, I was wrong regarding file size being an indicator for equal/unequal read counts!

The R1/R2 files have equal numbers of reads:

Sealice_F1_S20_L001_R1_001.fastq.gz
113474961
Sealice_F1_S20_L001_R2_001.fastq.gz
113474961
Sealice_F1_S20_L002_R1_001.fastq.gz
112706276
Sealice_F1_S20_L002_R2_001.fastq.gz
112706276
Sealice_F2_S22_L001_R1_001.fastq.gz
82177525
Sealice_F2_S22_L001_R2_001.fastq.gz
82177525
Sealice_F2_S22_L002_R1_001.fastq.gz
81699034
Sealice_F2_S22_L002_R2_001.fastq.gz
81699034

20200102_001

EDITED: Re-ordered files so that proper pairs are listed next to above/below each other.

kubu4 commented 4 years ago

Looking at the error message you shared and the count outputs from Bismark and my calculations, it looks like there's a problem with Sealice_F1_S20_L001_R1_001.fastq.gz that you used for input. Maybe it didn't get fully transferred when you pulled it off of Owl?

sr320 commented 4 years ago

You are correct! just ran md5...

Should have checked that first!