UCSC-Treehouse / pipelines

Makefiles to run dockerized pipelines used in Treehouse on a single sample
Apache License 2.0
3 stars 6 forks source link

Add experimental support for S3/CEPH and test on Jacob's files #14

Closed rcurrie closed 5 years ago

rcurrie commented 6 years ago

@jpfeil can you point to output from one of the samples that you have run by hand that I can use to verify that treeshop S3/CEPH is working?

jpfeil commented 6 years ago

@rcurrie Here is the tertiary output for a set of blood samples. You can use the rsem_genes.results file to validate the new run. I have used the cell line names here, but I can send you a mapping to the ids.

/scratch/jpfeil/ccle/ccle-blood-cancers/CCLE-blood-CDK4-2017-10-20

rcurrie commented 6 years ago

@jpfeil my code just uses the file names so I need the mapping from that to what's in CEPH. For example here's a pair mine picks up:

('G15512.HCC1954.5.btfv9.R1.fastq.gz', 'G15512.HCC1954.5.btfv9.R2.fastq.gz')

Can you map CCLE-blood-CDK4-2017-10-20 to whatever the files in CEPH will be?

jpfeil commented 6 years ago

G26223.697.2.btfv9.R1.fastq.gz G27288.ALL-SIL.1.btfv9.R1.fastq.gz G27213.AMO-1.1.btfv9.R1.fastq.gz G41707.BDCM.5.btfv9.R1.fastq.gz G27374.BL-41.1.btfv9.R1.fastq.gz G27318.DEL.1.btfv9.R1.fastq.gz G41717.EB1.5.btfv9.R1.fastq.gz G27324.EM-2.1.btfv9.R1.fastq.gz G26244.F-36P.2.btfv9.R1.fastq.gz G27335.GRANTA-519.1.btfv9.R1.fastq.gz G28867.HH.3.btfv9.R1.fastq.gz G28080.JURKAT.1.btfv9.R1.fastq.gz G28068.JVM-3.1.btfv9.R1.fastq.gz G30567.KARPAS-620.1.btfv9.R1.fastq.gz G41723.KE-97.5.btfv9.R1.fastq.gz G28090.KHM-1B.1.btfv9.R1.fastq.gz G20462.KMS-11.2.btfv9.R1.fastq.gz G26249.KMS-26.2.btfv9.R1.fastq.gz G26253.KMS-34.2.btfv9.R1.fastq.gz G26221.L-363.2.btfv9.R1.fastq.gz G26193.LP-1.2.btfv9.R1.fastq.gz G28581.Mino.1.btfv9.R1.fastq.gz G28560.MOLP-8.1.btfv9.R1.fastq.gz G28565.MOLT-16.1.btfv9.R1.fastq.gz G28577.NCO2.1.btfv9.R1.fastq.gz G26238.OCI-AML2.2.btfv9.R1.fastq.gz G28600.P12-ICHIKAWA.1.btfv9.R1.fastq.gz G27474.RPMI-8402.2.btfv9.R1.fastq.gz G27458.SIG-M5.2.btfv9.R1.fastq.gz G27201.SK-MM-2.1.btfv9.R1.fastq.gz G30609.SU-DHL-4.1.btfv9.R1.fastq.gz G30556.SU-DHL-8.1.btfv9.R1.fastq.gz G30555.Toledo.1.btfv9.R1.fastq.gz G41727.U-937.5.btfv9.R1.fastq.gz

jpfeil commented 6 years ago

@rcurrie, if you run a subset of the above files, I can check the concordance for the run I did. I do not have the secondary output of these samples anymore. I just have the tertiary. I still have the secondary output of the following samples:

G26182.KMS-12-BM.2.btfv9.R1.fastq.gz G26187.MONO-MAC-1.2.btfv9.R1.fastq.gz G26243.HT-1197.2.btfv9.R1.fastq.gz G27263.OPM-2.1.btfv9.R1.fastq.gz G27519.Raji.2.btfv9.R1.fastq.gz G28013.KARPAS-422.1.btfv9.R1.fastq.gz G28028.KG-1.1.btfv9.R1.fastq.gz G28058.MC116.1.btfv9.R1.fastq.gz G28825.HT-144.3.btfv9.R1.fastq.gz G28835.HT.3.btfv9.R1.fastq.gz G28863.HT-1080.3.btfv9.R1.fastq.gz G28881.HT-1376.3.btfv9.R1.fastq.gz G30561.SUP-T1.1.btfv9.R1.fastq.gz G30573.SU-DHL-6.1.btfv9.R1.fastq.gz G30579.JM1.1.btfv9.R1.fastq.gz G41670.HT-29.5.btfv9.R1.fastq.gz G41728.MEG-01.5.btfv9.R1.fastq.gz

rcurrie commented 6 years ago

Running G26182 through, will report back when done. Results will appear in:

/pod/pstore/groups/treehouse/treeshop/jpfeil

@jpfeil do you need the BAMs (from RNASeq or UMEND)? I'm currently deleting them so I don't backhaul. Easy to leave though.

jpfeil commented 6 years ago

@rcurrie, I'd like to keep the bams from RNASeq at least since there might be an issue with the mapping.

rcurrie commented 6 years ago

G26182 completed, take a look:

/pod/pstore/groups/treehouse/treeshop/jpfeil/downstream/G26182/secondary

Passed QC!

@jpfeil I've added a basic methods (not in this run) mostly so we can get an idea of how long they take for projections. Let me know if this looks ok and if so I"m going to crank up the next two in parallel to account for any race condition (can't imagine how it would happen though) and then we'll have a projection towards cranking up a ton and running through them all.

rcurrie commented 6 years ago

@jpfeil G26243 is out of the oven and it passed UMEND, seems like this is working. Another is running in parallel and I'll report when that is done.