UCSC-Treehouse / pipelines

Makefiles to run dockerized pipelines used in Treehouse on a single sample
Apache License 2.0
3 stars 6 forks source link

Support bam's as input to treeshop #8

Closed rcurrie closed 6 years ago

rcurrie commented 6 years ago

@hbeale I've added basic bam to fastq conversion:

                    docker run --rm \
                      -v /mnt/samples:/samples \
                      quay.io/ucsc_cgl/samtools:1.5--98b58ba05641ee98fa98414ed28b53ac3048bc09 \
                      fastq -1 /samples/{0}.R1.fq.gz -2 /samples/{0}.R2.fq.gz /samples/{1}

(Same method as used in cgl-rnaseq)

Treeshop will make the conversion and copy the resulting fastq's back to derived under archive for posterity and then proceed with rnaseq etc....

hbeale commented 6 years ago

Cool. Has it worked?

On Tue, Jan 9, 2018 at 1:33 PM, Rob Currie notifications@github.com wrote:

@hbeale https://github.com/hbeale I've added basic bam to fastq conversion:

                docker run --rm \
                  -v /mnt/samples:/samples \
                  quay.io/ucsc_cgl/samtools:1.5--98b58ba05641ee98fa98414ed28b53ac3048bc09 \
                  fastq -1 /samples/{0}.R1.fq.gz -2 /samples/{0}.R2.fq.gz /samples/{1}

(Same method as used in cgl-rnaseq)

Treeshop will make the conversion and copy the resulting fastq's back to derived under archive for posterity and then proceed with rnaseq etc....

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/UCSC-Treehouse/pipelines/issues/8, or mute the thread https://github.com/notifications/unsubscribe-auth/AADVg_kpOUUNDZfKuzYOtKeav50qS3yOks5tI9s_gaJpZM4RYeQG .

rcurrie commented 6 years ago

Hmmm...output isn't matching but maybe its my fastq -> bam:

docker run -it --rm -v pwd/samples:/data broadinstitute/picard FastqToSam F1=/data/TEST_R1.fastq.gz F2=/data/TEST_R2.fastq.gz O=/data/TEST.bam SM=TEST001 RG=rg0000

converting this bam back to fastq via samtools, then through rnaseq, then umend and the readDist.txt differs.

@hbeale is it reasonable that these should be identical:

fastqs -> rnaseq sorted.bam output -> umend

fastqs -> picard bam -> samtools to fastq -> rnaseq sorted.bam -> umend

?

< Total Reads                   3416
< Total Tags                    4133
< Total Assigned Tags           3922
---
> Total Reads                   1626
> Total Tags                    2050
> Total Assigned Tags           1978
6,15c6,15
< CDS_Exons           37671772            2792                0.07
< 5'UTR_Exons         18392664            219                 0.01
< 3'UTR_Exons         46333687            734                 0.02
< Introns             1419121300          155                 0.00
< TSS_up_1kb          26926674            2                   0.00
< TSS_up_5kb          121398195           9                   0.00
< TSS_up_10kb         221886368           18                  0.00
< TES_down_1kb        28738628            0                   0.00
< TES_down_5kb        125348902           2                   0.00
< TES_down_10kb       224262488           4                   0.00
---
> CDS_Exons           37671772            1230                0.03
> 5'UTR_Exons         18392664            58                  0.00
> 3'UTR_Exons         46333687            486                 0.01
> Introns             1419121300          167                 0.00
> TSS_up_1kb          26926674            3                   0.00
> TSS_up_5kb          121398195           3                   0.00
> TSS_up_10kb         221886368           3                   0.00
> TES_down_1kb        28738628            8                   0.00
> TES_down_5kb        125348902           30                  0.00
> TES_down_10kb       224262488           34                  0.00
rcurrie commented 6 years ago

converted bam in the develop branch:

https://github.com/UCSC-Treehouse/pipelines/tree/develop/samples

hbeale commented 6 years ago

when you say " samtools to fastq -> umend", does "fastq -> umend" represent the rna-seq pipeline (using STAR) and then the bam-umend-qc process?

yes, i'd expect them to be identical, but i don't know where to go if they're not. I'd approach is by comparing the outputs of these two approaches:

bam -> btfv9 -> groomed fq -> umend bam -> samtools to fastq -> un-groomed fq -> umend

On Tue, Jan 9, 2018 at 3:00 PM, Rob Currie notifications@github.com wrote:

converted bam in the develop branch:

https://github.com/UCSC-Treehouse/pipelines/tree/develop/samples

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/UCSC-Treehouse/pipelines/issues/8#issuecomment-356442539, or mute the thread https://github.com/notifications/unsubscribe-auth/AADVgxF4yloaWepmPXqDHXCs6Vv_gC7cks5tI--WgaJpZM4RYeQG .

rcurrie commented 6 years ago

Verified via notebook that round trip fastq -> bam -> fastq matches at the read level. Also verified that bam -> fastq generates the exact same secondary output as the original fastq's. Samtools in cgl docker used for the actual conversion. Will run a CHOC sample to finalize this improvement.