cytcc123 commented 5 years ago

Hi,

I have tried Fusioncatcher output for running Clinker, however, Fusioncatcher only generate one fq. file for one fusion gene.

Note: a pattern '%_*.fastq.gz' was provided, but did not match any of the files provided as input [/proj/snic2019-8-2/miniconda3/Clinker-master/out/genome/Genome].

Could you give me some suggestions to deal with that problem?

Thanks a lot, Yi

breons commented 5 years ago

Hi Yi,

You will input your original paired-end fastq files into Clinker (the one's that you would have used for FusionCatcher). You will have received some output from FusionCatcher that tells you the locations of all the fusion breakpoints in your samples, this is what will need to be input into Clinker. You can use the "col" parameter to identify the columns in the FusionCatcher output with the chromosome:breakpoint information.

This should work. If not, just post an example of your FusionCatcher output (a single line with the breakpoints) and I will update Clinker to accept it and I will give you a snippet that you can use moving forward.

For the pattern above to work with your fastq files, you need to name them by a similar format to below. If not, open up clinker.pipe and you can modify this (ask me how if you're not certain).

reads_1.fastq.gz reads_2.fastq.gz

Let me know how you go! Breon.

cytcc123 commented 5 years ago

Hi Breon,

Thank you for your consideration, I still couldn't deal with that.

Here is the fusion breakpoints: 20:46307420:- | 8:144669022:-

Fusioncatcher generated one paired-end fastq file, and I used seqtk seq -1 and seqtk seq -2 to split it into R1.fastq.gz and R2.fastq.gz. But these two files did not match with Clinker. R1.fastq.gz R2.fastq.gz reads.fq.gz

Thank you so much for your help, Yi

breons commented 5 years ago

Hi Yi,

The input pattern is currently %_*.fastq.gz. Try changing your naming to reads_R1.fastq.gz and reads_R2.fastq.gz.

In an upcoming version I am adding a parameter to allow you to use any input format. But, you can always change this is in workflow/clinker.pipe if you want to tweak it yourself.

That should get you through the input stage. Let me know how you go!

Breon.

cytcc123 commented 5 years ago

Hi Breon,

It works perferctly after I changed the names.

However, when I running into the plot process, it shows like this:

Stage index_bams (reads) ===================================== Indexing Alignment (/crex/proj/snic2019-8-2/miniconda3/Clinker-master/out/alignment/reads/Aligned.sortedByCoord.out.bam) Index Complete

==================================== Stage prepare_plot (reads) ====================================

SULF2:EEF1D

filtering BAM file for fusion of interest filtering BAM file for reads with overhangs < 2 (noise reduction) Creating ancillilary files Index BAM files [main_samview] region "SULF2:EEF1D" could not be parsed. Continue anyway.

==================================== Stage plot_fusion (reads) ===================================== [1] "Plotting: SULF2:EEF1D" [1] "------------------------------------------------------" [1] "Libraries and ancillary files loaded. Creating Tracks." [1] "There are no fusion junctions found in this gene pair." [1] "Terminating plotting for this fusion." NULL Error in read.table(locations$junctions) : no lines available in input ERROR: Expected output file /proj/snic2019-8-2/miniconda3/Clinker-master/out/plots/reads/SULF2_EEF1D.pdf (storage type bpipe.storage.LocalFileSystemStorageLayer@68ad7fa1) in stage plot_fusion (reads) could not be found

========================================= Pipeline Failed ==========================================

One or more parallel stages aborted. The following messages were reported:

--------------------------------------- Unknown ( reads ) ----------------------------------------

Expected output file /proj/snic2019-8-2/miniconda3/Clinker-master/out/plots/reads/SULF2_EEF1D.pdf (storage type bpipe.storage.LocalFileSystemStorageLayer@68ad7fa1) in stage plot_fusion (reads) could not be found

Use 'bpipe errors' to see output from failed commands.

Could you tell me where the problems happened？

Thank you so much for your help, Yi

breons commented 5 years ago

Hi Yi,

So! A couple steps to debug this. Generally this error means that Clinker could not find this fusion in the superTranscriptome.

Would you mind posting the bpipe snippet that you are running Clinker with?
In the Clinker output directory, could you please grep SULF2:EEF1D resource/fst_reference.fasta to make sure that fusion has been generated.
Lastly, can you give me the chr:breakpoints that FusionCatcher has used to identify those genes.

We will get there! Breon.

cytcc123 commented 5 years ago

Hi Breon,

Here are the bpipe snippets:

| Starting Pipeline at 2019-10-29 15:03 |

======================================== Stage generate_fst ========================================

==============================================================

    Fusion Super Transcript Generator

    A fusion visualiser.

==============================================================

Create fusion superTranscriptome:

Gene Symbols Mapped: 1 Not Mapped: 0 Total: 1

==============================================================

Creating output directory at: /proj/snic2019-8-2/miniconda3/Clinker-master/out Creating fused superTranscriptome and annotation files

...Success!

Use the plot_fst bpipe workflow or IGV to visualise your results.

==============================================================

====================================== Stage star_genome_gen ======================================= Oct 29 15:03:37 ..... started STAR run Oct 29 15:03:37 ... starting to generate Genome files Oct 29 15:04:42 ... starting to sort Suffix Array. This may take a long time... Oct 29 15:05:31 ... sorting Suffix Array chunks and saving them to disk... Oct 29 15:09:10 ... loading chunks from disk, packing SA... Oct 29 15:09:44 ... finished generating suffix array Oct 29 15:09:44 ... generating Suffix Array index Oct 29 15:09:44 ... completed Suffix Array index Oct 29 15:09:44 ... writing Genome to disk ... Oct 29 15:10:00 ... writing Suffix Array to disk ... Oct 29 15:10:02 ... writing SAindex to disk Oct 29 15:10:02 ..... finished successfully

===================================== Stage star_align (reads) ===================================== Oct 29 15:10:06 ..... started STAR run Oct 29 15:10:06 ..... loading genome Oct 29 15:10:24 ..... started mapping Oct 29 15:10:24 ..... finished mapping Oct 29 15:10:25 ..... started sorting BAM Oct 29 15:10:26 ..... started wiggle output Oct 29 15:10:26 ..... finished successfully

===================================== Stage index_bams (reads) ===================================== Indexing Alignment (/crex/proj/snic2019-8-2/miniconda3/Clinker-master/out/alignment/reads/Aligned.sortedByCoord.out.bam) Index Complete

==================================== Stage prepare_plot (reads) ====================================

SULF2:EEF1D2

filtering BAM file for fusion of interest filtering BAM file for reads with overhangs < 2 (noise reduction) Creating ancillilary files [main_samview] region "SULF2:EEF1D2" could not be parsed. Continue anyway. Index BAM files

==================================== Stage plot_fusion (reads) ===================================== [1] "Plotting: SULF2:EEF1D2" [1] "------------------------------------------------------" [1] "Libraries and ancillary files loaded. Creating Tracks." [1] "There are no fusion junctions found in this gene pair." [1] "Terminating plotting for this fusion." NULL Error in read.table(locations$junctions) : no lines available in input ERROR: Expected output file /proj/snic2019-8-2/miniconda3/Clinker-master/out/plots/reads/SULF2_EEF1D2.pdf (storage type bpipe.storage.LocalFileSystemStorageLayer@34bd8d80) in stage plot_fusion (reads) could not be found

========================================= Pipeline Failed ==========================================

One or more parallel stages aborted. The following messages were reported:

--------------------------------------- Unknown ( reads ) ----------------------------------------

Expected output file /proj/snic2019-8-2/miniconda3/Clinker-master/out/plots/reads/SULF2_EEF1D2.pdf (storage type bpipe.storage.LocalFileSystemStorageLayer@34bd8d80) in stage plot_fusion (reads) could not be found

Use 'bpipe errors' to see output from failed commands.

When I used the command: grep SULF2:EEF1D resource/fst_reference.fasta It showed like this: [yiche@rackham1 out]$ grep SULF2:EEF1D resource/fst_reference.fasta grep: resource/fst_reference.fasta: No such file or directory

Thank you so much for you help, Yi fusioncatcher result.xlsx

breons commented 5 years ago

Hi Yi,

Ah! Apologies, the command should have been grep SULF2:EEF1D reference/fst_reference.fasta from the clinker output directory. reference not resource.

But, given you are only interested in a single fusion judging from bpipe, could you just simply do head -1 reference/fst_reference.fasta when in the clinker output directory? If this isn't "SULF2:EEF1D" then something has gone wrong in the first stage.

If your FusionCatcher output was generated from an alignment to hg38, remember to make sure to change the genome parameter in the bpipe command as the default is hg19, i.e. -p genome="38". But before this, you will need to either assign a new output directory or delete your current Clinker output so that all stages execute again.

Hoping this helps! Breon.

cytcc123 commented 4 years ago

Hi Breos,

Sorry for my late reply, I was successfully made it since I have tried different fusion genes, some fusions may be just backround.

Thank you again for your help!

Best, Yi

Oshlack / Clinker

Using Clinker with fusioncatcher output #17

SULF2:EEF1D

Here are the bpipe snippets:

| Starting Pipeline at 2019-10-29 15:03 |

SULF2:EEF1D2