SchulzLab / Aeron

Alignment, quantification and fusion prediction from long RNA reads
MIT License
10 stars 8 forks source link

Example output files? #5

Closed stianlagstad closed 4 years ago

stianlagstad commented 4 years ago

Hi,

I would like to know if the output data that Aeron produces can be used with https://github.com/stianlagstad/chimeraviz. Do you have any example output files that you can share?

Thank you!

SchulzLab commented 4 years ago

Hi, currently the output files of Aeron are not prepared in a way that they can directly be used with chimeraviz. As it appears to be a useful tool, we are looking into adding chimeraviz compatible files in the future.

Thanks for the suggestion, Marcel

stianlagstad commented 4 years ago

@SchulzLab : Thank you very much for the response. Do you have an example output file nonetheless, so that I can think about how to possibly implement support for it?

maickrau commented 4 years ago

Hi,

Here's an example of the fusion output. The predicted fusion transcripts are in the file "fusiontranscript..._.fa". Information about the fusion is included in the name:

>fusion_1_ENSG00000092010.14_1206bp_ENSG00000100908.13_1052bp_1reads

The format is fusion{id}_{gene1}_{gene1 size}_{gene2}_{gene2 size}{constructing reads}reads. id is an unique identifier per fusion, gene1 and gene2 are the ensembl IDs of the two genes involved in the fusion, gene1/2 size is the approximate size of the transcript on either side of the fusion breakpoint, and constructing reads is the number of reads used for building the predicted fusion transcript. The fusion sequence itself has an N at the fusion breakpoint location. The file does not have information about the position of the two genes in the genome and this has to be retrieved from elsewhere using the ensembl IDs.

In addition to this, there is the file "fusionsupport..._.txt" which contains a table of fusion names and the number of reads that support the fusion. A read supports a fusion if its primary alignment spans the fusion breakpoint and 150bp from both sides. This number can be and usually is different than the number of reads used for building the predicted fusion transcript.

Finally, there are bam files (not included in the example) of the read alignments to the reference transcripts + predicted fusion transcripts, one for all alignments and one filtered for alignments which support a fusion.

fusion_example.zip

stianlagstad commented 4 years ago

Thank you very much! Support for Aeron has been added to chimeraviz in this PR: https://github.com/stianlagstad/chimeraviz/pull/81