Closed jungminchoilab closed 5 years ago
After discussing privately, it would appear that this users bug originated from using HISAT to align to the data. Switching to STAR fixed this.
i have the same issue. i'm using star ERROR: length(unique(exons$chr)) == 1 is not TRUE
Hello! Can you elaborate on how you prepared the leafviz RData file like the user above did?
A lot of leafviz errors arise from mismatches between the RNA-seq BAM files provided by the user and the GENCODE annotation files provided by us. Can you verify that your BAM files are aligned to the same genome build as the annotation files (eg hg38 or hg19) and that the chromosome naming scheme is [chr1,chr2,chr3,...] rather than [1,2,3] ?
Hi Jack, the bam files were generated in a two step STAR run, using hg38. I guess you will have to take my word for it, since it was done in a script with many variables.
then converting bams to junc:
clustering:
differential splicing
visualization an example of the plot generated here:
visualize with LeafViz
this is the result:
you can see that while a gene name appeared in the first shot, no gene names showed up by LeafViz.
thanks
Curious. Can you paste over the first 10 lines of the perind_numers.counts.gz and the gencode_hg38.exons.txt files inside the annotation_codes/gencode_hg38 / folder please?
sure.. gencode_hg38_all_exons.txt chr start end strand gene_name chr1 11869 12227 + processed_transcript chr1 12613 12721 + processed_transcript chr1 13221 14409 + processed_transcript chr1 12010 12057 + transcribed_unprocessed_pseudogene chr1 12179 12227 + transcribed_unprocessed_pseudogene chr1 12613 12697 + transcribed_unprocessed_pseudogene chr1 12975 13052 + transcribed_unprocessed_pseudogene chr1 13221 13374 + transcribed_unprocessed_pseudogene chr1 13453 13670 + transcribed_unprocessed_pseudogene chr1 29534 29570 - unprocessed_pseudogene chr1 24738 24891 - unprocessed_pseudogene chr1 18268 18366 - unprocessed_pseudogene chr1 17915 18061 - unprocessed_pseudogene chr1 17606 17742 - unprocessed_pseudogene chr1 17233 17368 - unprocessed_pseudogene chr1 16858 17055 - unprocessed_pseudogene chr1 16607 16765 - unprocessed_pseudogene chr1 15796 15947 - unprocessed_pseudogene chr1 15005 15038 - unprocessed_pseudogene chr1 14404 14501 - unprocessed_pseudogene
perind_numers.counts.gz 0700356301 0700356291 0700356331 0700356311 0700356321 0700356341 chr2:219001:221464:clu_1_NA 5 0 0 0 0 1 chr2:219001:224864:clu_1_NA 11 7 6 16 6 4 chr2:219001:229966:clu_1_NA 19 12 13 10 16 6 chr2:224920:229966:clu_1_NA 15 7 6 14 10 2 chr2:231191:233101:clu_2_NA 15 13 10 14 5 13 chr2:231191:234160:clu_2_NA 1 0 1 1 1 1 chr2:233229:234160:clu_2_NA 27 13 10 11 16 20 chr2:253115:256207:clu_3_NA 2 1 0 0 0 2 chr2:253115:260085:clu_3_NA 7 2 5 5 1 6 chr2:253115:263984:clu_3_NA 1 2 2 0 2 1 chr2:271939:272037:clu_4_NA 39 81 50 40 38 35 chr2:271939:272192:clu_4_NA 13 2 4 5 8 7 chr2:272065:272192:clu_4_NA 2 8 2 7 7 2 chr2:272150:275140:clu_5_NA 23 39 30 28 26 19 chr2:272150:276980:clu_5_NA 1 3 1 1 1 1
Hi Jack, did you figure out the issue?
thanks
Hi afadda91,
It looks like there's something wrong with your gencode_hg38_all_exons.txt file. Did you construct it yourself or did you download it? David hasn't been maintaining the downloaded example files very well 😛 . Try making it yourself from a GENCODE GTF file (here: ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_33/gencode.v33.annotation.gtf.gz) and using our included perl script (gtf2leafcutter.pl) to create it again.
Let me know how you get on. Sorry for the delay!
thanks a lot Jack. it worked to an extent. i still get some unlabeled genes.
one more question, what do you consider a differentially spliced site in terms of delta PSI?
You always get some unlabelled genes. Those in your screen shot look like artifacts with so many junctions.
For dPSI, the standard is a 10% change as that's about the limit for how small a change you can reliably validate with RT-PCR. But it depends on your dataset and what you'd expect to see. If you have a splicing factor knockdown in a cell line or you're comparing two very different tissues you may see dPSI of 50%-100%. But if you're looking in a big human cohort comparing some disease where splicing is indirectly involved you may only find changes of a few %, even if they have very low adjusted P values.
Hi, I have got the *significance.txt file, but it is not annotated(column "gene" is missing). Could you tell me how to annotate the intron cluster? Thanks! @jackhump @afadda91 @jungminchoilab
Hi! You can use prepare_results.R as above to annotate each cluster with a gene name. Make sure to create the annotation file yourself using our perl script.
Dear developers,
Thanks for sharing this software! I was exploring LeafViz and noticed that I cannot get gene name displayed properly... (please see below).
I also checked the *significance.txt file and it is annotated fine.
I prepared the RData file as below.
and visualized as below
Any of your inputs will be appreciated. Looking forward to hearing from you soon and please let me know if you need other information from me.
Best, Jungmin