Xinglab / espresso

Other
58 stars 4 forks source link

Transcripts support different numbers of reads #66

Open liuxiaoning-wq opened 3 months ago

liuxiaoning-wq commented 3 months ago

Hi there,Do I need full-length transcripts to use ESPRESSO? For example, if the reads are not full-length, do they need to be filtered out? Why do different positions of the same transcript support different read numbers? For example, my image has 20 at the beginning and 178 at the end, and another image has 87.93 at the beginning and 100 at the end.

b1fb1db278fc16f8d839ecfb498f748 b60ebe18629f3382f7764fe7cb74a1f

visualization_sirv

EricKutschera commented 3 months ago

ESPRESSO expects that some of the reads will cover all the splice junctions in the transcript that the read is from and that other reads will only cover some of the junctions in the transcript. If a read has a sequence of splice junctions that could have come from multiple different full length transcripts then ESPRESSO can assign a partial count for that read to each matching transcript

Those different numbers are likely from different transcripts, not different positions of the same transcript. If you zoom in more you may see see more details about the transcripts

liuxiaoning-wq commented 3 months ago

Thanks for your reply. I would like to ask if there is a corresponding relationship between the gene_ID N1 and N2 in the esp file and the values ​​in igv. If so, why are the values ​​different? For example: ENSG00000124713.6 N1 is 1175.3, but N1 in igv is 1461? Also, there are five transcripts in esp, but only two are shown in igv? 035ccd16a35cc3d67640503d0c5e15d ce577733644106ec61a0e697639a694

liuxiaoning-wq commented 3 months ago

And can the transcript ID be displayed in igv visualization?

EricKutschera commented 3 months ago

It looks like N1 and N2 are your sample names and you loaded the N1.bed and N2.bed files output from visualize.py. The image shown in the README doesn't load those sample level bed files. Instead it uses the transcript level bed files output under target_genes/: https://github.com/Xinglab/espresso/tree/v1.4.0?tab=readme-ov-file#igv

liuxiaoning-wq commented 3 months ago

Thank you very much. In this case, there are only four bed file for each sample of one ENST transcript in the target_genes of the visualization results, however, there are five transcripts in the esp file with four novel ESPRESSO transcripts. these four novel ESPRESSO transcripts are not show in target_genes file 1 2 . Why is that?

EricKutschera commented 3 months ago

What was the command you ran? From https://github.com/Xinglab/espresso/tree/v1.4.0?tab=readme-ov-file#visualization-arguments

--target-gene TARGET_GENE the name of the gene to visualize. transcripts with name like {target-gene}-{number} or gene_id like {target-gene}.* will have output generated. Use the gene_id to match novel isoforms output by ESPRESSO

Based on that description it seems like --target-gene GNMT would only create the bed files for ENST00000372808.4 since it has transcript name GNMT-201. If you run with --target-gene ENSG00000124713 then I think it should create output for the novel transcripts

liuxiaoning-wq commented 3 months ago

Thanks again. 1. Can we just look at the numbers to determine whether there is a new transcript in this sample? If the number is zero, it means that the transcript does not exist, right? 2. Can these numbers represent the expression levels of these transcripts? Can we use these numbers to do differential analysis? 1e26a88cf636fb70f787a88481e6eb9

EricKutschera commented 3 months ago

Those numbers are from the abundance.esp file and they show the number of reads from that sample which ESPRESSO counted toward each isoform. If it's zero then ESPRESSO did not detect that transcript in that sample. Yes, you can use them for differential analysis (rMATS-long uses ESPRESSO output for differential analysis: https://github.com/Xinglab/rMATS-long)

liuxiaoning-wq commented 3 months ago

thank you for your reply

liuxiaoning-wq commented 3 months ago

Hello, these are three new transcripts of this gene. How can I obtain the sequences of these three new transcripts? 4bc1a8ddc38f721c82f1587cd361018

EricKutschera commented 3 months ago

The coordinates for those transcripts should be in the updated.gtf file. See this post for a way to get the sequence from the gtf and fasta: https://github.com/Xinglab/espresso/issues/48

liuxiaoning-wq commented 2 months ago

okay, thank you

liuxiaoning-wq commented 2 months ago

Hello, can I use espresso to analyze fusion genes? If so, how do I do it and where can I see the results?

EricKutschera commented 2 months ago

ESPRESSO doesn't specifically look for fusion genes and it might filter out alignments for fusion genes. There is a filter for alignments with large insertions (defaults to 20bp): https://github.com/Xinglab/espresso/blob/v1.5.0/src/ESPRESSO_S.pl#L924 Also ESPRESSO will only use 1 alignment per read even if there are supplementary alignments