flemingtonlab / SpliceTools

GNU General Public License v3.0
18 stars 5 forks source link

SETranslateNMD no events undergoing NMD? #6

Open EspressoKris opened 1 year ago

EspressoKris commented 1 year ago

Hi all,

Fantastic downstream tool! I have generated some rMATS results following STAR alignment to GENCODE v42 genome. Following your guide I created a bed12 file from the comprehensive gene annotation gtf file, then used the primary genome fasta file along the SE.MATS.JC/JCEC input to evaluate NMD events. However, while the software generates a folder with various subfolders, it seems that 0 events are undergoing NMD which cannot be true. Also, all files are 0 in size.

For reference, the bed file generated has the following format:

chr1 11868 14409 ENST00000456328.2 100 + 11868 14409 255,0,0 3 359,109,1189 0,744,1352

Same result by appending 'gene_name' with lowerscore in the 4th column, or by removing the version in the transcript.

Have I perhaps done anything wrong? Any help would be much appreciated!

Edit: using your annotation file i was able to return an output, so is likely that the gtf2bed script does not work with the gencode gtf. I am going to study the output to see if it actually worked.

nungerleider commented 11 months ago

Sorry for just getting back to you now (I must have missed the notification). Have you been able to solve this problem? If you aren't getting any output, there must be no match between the IDs in your rMATS output and your annotation. Can you send your input rMATS file and the gencode bed12 file you created?

EspressoKris commented 11 months ago

Hi @nungerleider,

Sorry for the late response. I have been working on other analyses and side things and did not notice the response.

So I have managed to get an output, but only by using your gene_annotation bed file, obtaining no results when using the one coverted from the gencode v42 gtf via the gtf2bed script.

Comparing the bed files headers I see the following: 1) hg38_1_22_X_Y_M.bed

chr1 11868 14409 ENST00000456328_DDX11L1 1000 + 14409 14409 0 3 359,109,1189, 0,744,1352, 2) GENCODE_v42.annotation.bed (converted from comprehensive gene annotation - gtf: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_42/gencode.v42.annotation.gtf.gz chr1 11868 14409 ENST00000456328.2_DDX11L2 100 + 11868 14409 255,0,0 3 359,109,1189 0,744,1352

Following the files as requested: Test.zip

Thanks!

zhou-ran commented 5 months ago

Hello everyone,

I discovered that columns 6 and 7 of the BED12 file provided by the author indicate whether the transcript is protein-coding. I rewrote the gtf2bed.py script to generate a similar BED12 file, and the output turned out to be correct. I've attached it to this page for anyone else who might encounter the same issue.

gtf2bed.py.txt

Ran