Open jpfeil opened 5 years ago
Thanks, Jacob! Will you please also add the following genes to the gene list?
aggregatedCancerGenes_2018-01-04_12.20.15PM.txt
Details on the gene list are here: https://github.com/UCSC-Treehouse/analysis-methods/blob/master/gene_lists/gene_list_readme.md
@e-t-k I want to run the updated pipeline on a few samples. Can you please copy these fastq files to /scratch/jpfeil/fusion? Thanks!
TH01_0122_S01 TH01_0129_S01 TH01_0132_S01
@jpfeil per discussion, TH34_1455_S01 fastqs have been copied to:
/scratch/ekephart/fusion/TH34_1455_S01
on razzmatazz.prism
I did not have permissions to write directly to your /scratch/jpfeil/fusion
dir.
So you may mv
my copy to your dir instead.
The latest version of the pipeline finds the TH34_1455_S01 EWSR1--PATZ1 fusion.
@e-t-k The digest for the latest version is sha256:9e5ce87104287205f3ece4773296b219c71974d48f1e0b92de2fc629168479a2
@jpfeil could you double-check that the SHA is correct? I don't see that https://hub.docker.com/r/ucsctreehouse/fusion has been updated recently; and I'm unable to pull the image by that sha:
$ docker run --rm ucsctreehouse/fusion@sha256:9e5ce87104287205f3ece4773296b219c71974d48f1e0b92de2fc629168479a2
Unable to find image 'ucsctreehouse/fusion@sha256:9e5ce87104287205f3ece4773296b219c71974d48f1e0b92de2fc629168479a2' locally
docker: Error response from daemon: manifest for ucsctreehouse/fusion@sha256:9e5ce87104287205f3ece4773296b219c71974d48f1e0b92de2fc629168479a2 not found.
See 'docker run --help'.
Sorry, @e-t-k I pushed to the wrong docker hub. Try this one:
docker run --rm ucsctreehouse/fusion@sha256:633adf491aac8c216df2855e47a2ffd55c9af6c5f646ae0944a4273f33caffe0
@jpfeil Thanks for the new SHA.
I've just done a test run on the pipelines' test FASTQs and it has errored out. The key line seems to be ERROR: didn't find at least 1000 BAM records properly ordered along a single scaffold. at /opt/trinityrnaseq-Trinity-v2.4.0/util/support_scripts/ensure_coord_sorted_sam.pl
and full log: fusion-log-error.txt
Note that previously, these test files didn't have any fusions that passed the gene list, so FusionInspector was skipped entirely.
Is this something I can resolve in the fab wrapper, or do you need to add a check to the script? And let me know if you need access to any of the intermediate files.
Thanks, @e-t-k are you using the --run-fusion-inspector flag? Try removing it.
@jpfeil Yes, I am using --run_fusion_inspector. But if I remove it from the pipelines Makefile, then we won't get FusionInspector results at all for any sample; are those actually not important for you?
Some more info:
The test sample has 1 fusion in star-fusion.fusion_candidates.final.in_genelist.abridged
, BRD4--RFX1
Docker run command:
docker run --rm \
-v $(shell pwd)/outputs:/data/outputs \
-v $(shell pwd)/samples:/data/samples \
-v $(shell pwd)/references:/data/references \
ucsctreehouse/fusion@sha256:633adf491aac8c216df2855e47a2ffd55c9af6c5f646ae0944a4273f33caffe0 \
--left_fq $(R1) \
--right_fq $(R2) \
--output_dir outputs/fusions \
--CPU `nproc` \
--genome_lib_dir references/STARFusion-GRCh38gencode23 \
--run_fusion_inspector
@jpfeil After some thought, here's my proposal. What do you think of:
--clean
arg so that we only get whatever whitelisted output has been created and all the intermediate output is removed.The downside of this is that if star or FusionInspector fail for a "legitimate" reason, it will be less obvious and will be more effort to debug; but I haven't seen much evidence of that happening in all the samples we've run previously.
@e-t-k I think it's cleaner if the fusion pipeline fails gracefully instead of blowing up. I'll modify the code to write the error message to a log file.
@jpfeil Perfect; that way it will be continue to be obvious if it does blow up. So just let me know whenever you have the new SHA and I'll take it from there :-)
@e-t-k I'm not able to reproduce the error, but I added code to save an error log instead of raising the error. Let me know if this version causes the same problem:
sha256:827aa24b9e3711d56544c9df11dc990c4cf9cd7fca7bd84cca481c0463ea7434
The pipeline filters genes were one of the genes is not in the gene-list. This may miss some fusions of interest.