Open liuxiaoning-wq opened 3 months ago
ESPRESSO expects that some of the reads will cover all the splice junctions in the transcript that the read is from and that other reads will only cover some of the junctions in the transcript. If a read has a sequence of splice junctions that could have come from multiple different full length transcripts then ESPRESSO can assign a partial count for that read to each matching transcript
Those different numbers are likely from different transcripts, not different positions of the same transcript. If you zoom in more you may see see more details about the transcripts
Thanks for your reply. I would like to ask if there is a corresponding relationship between the gene_ID N1 and N2 in the esp file and the values in igv. If so, why are the values different? For example: ENSG00000124713.6 N1 is 1175.3, but N1 in igv is 1461? Also, there are five transcripts in esp, but only two are shown in igv?
And can the transcript ID be displayed in igv visualization?
It looks like N1 and N2 are your sample names and you loaded the N1.bed and N2.bed files output from visualize.py. The image shown in the README doesn't load those sample level bed files. Instead it uses the transcript level bed files output under target_genes/
: https://github.com/Xinglab/espresso/tree/v1.4.0?tab=readme-ov-file#igv
Thank you very much. In this case, there are only four bed file for each sample of one ENST transcript in the target_genes of the visualization results, however, there are five transcripts in the esp file with four novel ESPRESSO transcripts. these four novel ESPRESSO transcripts are not show in target_genes file . Why is that?
What was the command you ran? From https://github.com/Xinglab/espresso/tree/v1.4.0?tab=readme-ov-file#visualization-arguments
--target-gene TARGET_GENE the name of the gene to visualize. transcripts with name like {target-gene}-{number} or gene_id like {target-gene}.* will have output generated. Use the gene_id to match novel isoforms output by ESPRESSO
Based on that description it seems like --target-gene GNMT
would only create the bed files for ENST00000372808.4 since it has transcript name GNMT-201. If you run with --target-gene ENSG00000124713
then I think it should create output for the novel transcripts
Thanks again. 1. Can we just look at the numbers to determine whether there is a new transcript in this sample? If the number is zero, it means that the transcript does not exist, right? 2. Can these numbers represent the expression levels of these transcripts? Can we use these numbers to do differential analysis?
Those numbers are from the abundance.esp file and they show the number of reads from that sample which ESPRESSO counted toward each isoform. If it's zero then ESPRESSO did not detect that transcript in that sample. Yes, you can use them for differential analysis (rMATS-long uses ESPRESSO output for differential analysis: https://github.com/Xinglab/rMATS-long)
thank you for your reply
The coordinates for those transcripts should be in the updated.gtf file. See this post for a way to get the sequence from the gtf and fasta: https://github.com/Xinglab/espresso/issues/48
okay, thank you
Hello, can I use espresso to analyze fusion genes? If so, how do I do it and where can I see the results?
ESPRESSO doesn't specifically look for fusion genes and it might filter out alignments for fusion genes. There is a filter for alignments with large insertions (defaults to 20bp): https://github.com/Xinglab/espresso/blob/v1.5.0/src/ESPRESSO_S.pl#L924 Also ESPRESSO will only use 1 alignment per read even if there are supplementary alignments
Hi there,Do I need full-length transcripts to use ESPRESSO? For example, if the reads are not full-length, do they need to be filtered out? Why do different positions of the same transcript support different read numbers? For example, my image has 20 at the beginning and 178 at the end, and another image has 87.93 at the beginning and 100 at the end.