garber-lab / ESAT

Stripped-down version of original ESAT code
6 stars 5 forks source link

Extension overlaps null. - extension shortened to negative value #28

Closed ampodu closed 7 years ago

ampodu commented 7 years ago

I am currently trying to get gene counts from a macaca RNA-seq data set that i mapped with the Mmul 8 annotation data.

My ESAT command: java -jar esat.jar -task score3p -geneMapping Annotations/Mmul.flux.ESAT -alignments macaca_input.txt -multimap ignore -out macaca_test/macaca_test

The geneMapping text file contains 8757 human - macaca ortholog genes. The file was created according to your guidlines.

ESAT works just fine until the point where it checks for Extension overlaps where it gives this warning:

WARN [2017-01-30 15:16:50,376] [SAMSequenceCountingDict.java:280] [main] Gene ENSMMUG00000007085 (+) extension overlaps null. 0-base extension shortened to -80106 33185 [main] WARN umms.esat.NewESAT - Gene ENSMMUG00000007085 (+) extension overlaps null. 0-base extension shortened to -80106

Following this comes this error and ESAT then crashes:

Exception in thread "main" java.lang.NegativeArraySizeException at umms.esat.SAMSequenceCountingDict.countWindowedReadStarts(SAMSequenceCountingDict.java:292) at umms.esat.SAMSequenceCountingDict.countWindowedTranscriptReadStarts(SAMSequenceCountingDict.java:649) at umms.esat.NewESAT.(NewESAT.java:217) at umms.esat.NewESAT.main(NewESAT.java:237)

So i have this problem when I use 0 extension and also with 5k extension. With the 5k extension it reports every overlap extension shortening until it gets to this gene on chromosome 14.

I removed the gene that is causing the problems in my geneMapping text file with the result that ESAT reports the same error with the next gene in the chromosome (ENSMMUG00000007086), here it reports that the 0-base extension is shortened to -143426. This results to the same position in the genome: Chr 14: 107043721. I checked the position and there is nothing there.

With my human data where I do basically everything the same, just with different annotation data, it works just fine. I checked my annotation data for ESAT and the mapping overall and especially on the gene, could not find any errors there. The data has also already been counted with featurecount which also worked fine.

It would be nice to know why it shows null. instead of a gene where it is overlapping.

Any help or suggestions are appreciated.

Best regards and thanks, Lukas

ampodu commented 7 years ago

i found the error in my bam files

sztankatt commented 5 years ago

@ampodu what was the problem? I am facing the same issue now

ampodu commented 5 years ago

Hi @sztankatt I think it was a problem with the annotation data and the reference data. Something did not fit there. But I am not sure and I won't have access to the data any time soon to check that. Sorry I couldn't be more specific.

sztankatt commented 5 years ago

Yeah, so my issue was with the annotation. One of the genes had the same name under 'name' and 'name2', while it should be that the 'name2' is the name of the transcript. This was in dm6 from Ensembl...