Closed m-swirski closed 5 years ago
Dear Michal,
that is a strange error... I would say this due to some corrupted bam file, but if this works otherwise...
Could you do two things: First enter
gedi Nashorn -e "println(load('xyz.bam').getReferenceSequences())"
This should output all chromosomes in the index of the bam file. Then, please enter
gedi Nashorn -e "load('xyz.bam').ei('IV-').print()"
and test a few of the chromosomes instead of IV. My guess is that something will be wrong with either IV or V (based on the progress indicator). If there is no error, you could also try
gedi Nashorn -e "load('xyz.bam').ei().print()"
Florian
Thanks for help, It appears that reads aligning to Mitochondrial sequence cause the problem, but I dont see a reason why would it be so. For the time being I can just remove chrM, but what may cause the problem?
Michał
Dear Florian,
I also got a similar error when testing my data -
PRICE 1.0.2 Genome GRCm38
The error message:
2018-09-05 17:02:15.782 INFO Estimating maxpos...
2018-09-05 17:02:15.787 INFO Clustering reads
- Clustering reads 7364957 (66361.7/sec)
\ Identify maxpos 7469548 (67562.3/sec)
\ Collecting statistics 18-:88894181-88894212 3610954 (33032.6/sec)PRICE
PRICE is an analysis method for Ribo-seq data.
Error:
java.lang.RuntimeException: Could not run gedi.riboseq.javapipeline.PriceCollectSufficientStatistics@9d2be09
Could not run gedi.riboseq.javapipeline.PriceCollectSufficientStatistics@9d2be09
Invalid reference index -1
gedi -e PRICE <Options>
General:
-reads <reads> The mapped reads from the ribo-seq experiment.
-prefix <prefix> The prefix used for all output files
-genomic <genomic> The indexed GEDI genome.
Commandline:
-progress Show progress
-D Verbose output of errors
-h Show usage
-hh Show verbose usage
-hhh Show extra verbose usage
-dry Dry run
-keep Do not remove temp files
java.lang.RuntimeException: java.lang.RuntimeException: Could not run gedi.riboseq.javapipeline.PriceCollectSufficientStatistics@9d2be09
at gedi.util.job.schedule.DefaultPetriNetScheduler.run(DefaultPetriNetScheduler.java:222)
at gedi.util.program.GediProgram$1.execute(GediProgram.java:359)
at gedi.util.program.GediProgram.run(GediProgram.java:204)
at executables.Price.main(Price.java:68)
Caused by: java.lang.RuntimeException: Could not run gedi.riboseq.javapipeline.PriceCollectSufficientStatistics@9d2be09
at gedi.util.program.GediProgramJob.execute(GediProgramJob.java:63)
at gedi.util.program.GediProgramJob.execute(GediProgramJob.java:27)
at gedi.util.job.FireTransition.call(FireTransition.java:54)
at gedi.util.job.FireTransition.call(FireTransition.java:25)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Invalid reference index -1
at htsjdk.samtools.QueryInterval.<init>(QueryInterval.java:24)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:504)
at gedi.region.bam.BamGenomicRegionStorage$BufferingGenomicRegionSpliterator.iterator(BamGenomicRegionStorage.java:992)
at gedi.region.bam.BamGenomicRegionStorage$BufferingGenomicRegionSpliterator.tryAdvance(BamGenomicRegionStorage.java:1037)
at gedi.util.functions.MappedSpliterator.tryAdvance(MappedSpliterator.java:54)
at gedi.util.functions.MappedSpliterator.tryAdvance(MappedSpliterator.java:54)
at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
at gedi.util.functions.EI$4.hasNext(EI.java:141)
at gedi.riboseq.cleavage.CleavageModelEstimator.collectEstimateData(CleavageModelEstimator.java:283)
at gedi.riboseq.javapipeline.PriceCollectSufficientStatistics.execute(PriceCollectSufficientStatistics.java:59)
at gedi.util.program.GediProgramJob.execute(GediProgramJob.java:61)
... 7 more
I've tested different data, it all seems happened on 18-.
I've run the three gedi Nashorn
commands you suggested, the output:
[X+,X-,Y+,Y-,10+,10-,11+,11-,12+,12-,13+,13-,14+,14-,15+,15-,16+,16-,17+,17-,18+,18-,MT+,MT-,19+,19-,1+,1-,2+,2-,3+,3-,4+,4-,5+,5-,6+,6-,7+,7-,8+,8-,9+,9-]
18-:88894181-88894212 [1] x1
18-:88894181-88894212 [1] x1
It is a bit cryptic, could you please help?
Best wishes, Fengyuan
Dear Fengyuan,
my guess is that this is also related to the mitochondrial genome (which you might want to remove anyways...). Could you check how this is named in the bam file (samtools view -H xyz.bam
)?
Best, Florian
Dear Florian,
Thanks for your quick reply. The mitochondrial is named chrM
not MT
. Does PRICE convert it to MT?
@HD VN:1.4 SO:coordinate
@SQ SN:chr1 LN:195471971
@SQ SN:chr2 LN:182113224
@SQ SN:chr3 LN:160039680
@SQ SN:chr4 LN:156508116
@SQ SN:chr5 LN:151834684
@SQ SN:chr6 LN:149736546
@SQ SN:chr7 LN:145441459
@SQ SN:chr8 LN:129401213
@SQ SN:chr9 LN:124595110
@SQ SN:chr10 LN:130694993
@SQ SN:chr11 LN:122082543
@SQ SN:chr12 LN:120129022
@SQ SN:chr13 LN:120421639
@SQ SN:chr14 LN:124902244
@SQ SN:chr15 LN:104043685
@SQ SN:chr16 LN:98207768
@SQ SN:chr17 LN:94987271
@SQ SN:chr18 LN:90702639
@SQ SN:chr19 LN:61431566
@SQ SN:chrX LN:171031299
@SQ SN:chrY LN:91744698
@SQ SN:chrM LN:16299
Best wishes, Fengyuan
Dear Florian,
After removing all chrM reads and reheadered the bam (only removing reads is not working neither), I can successfully produce results. Many thanks for your help.
Do you think it is a PRICE issue to handle Mitochondrion incorrectly?
Best wishes, Fengyuan
Dear Fengyuan,
yeah, we try to handle cases where the genome comes from ucsc (e.g. with chr), but the annotation comes from ensembl (e.g. without chr). Unfortunately, there was a bug...
I fixed this in the most recent version (1.0.3)!
Florian
Dear Florian,
I've just tested v1.0.3, it's working now. Thanks for the quick fix!
One minor point - when I run PRICE, it still shows
2018-09-06 13:32:59.233 INFO Gedi 1.0.2 (JAR) startup
Are PRICE and GEDI having the same version number?
I also have a question about converting orf.cit to BED. It seems the BED output is a mix of BED6 (for non-spliced ORFs) and BED12 (for spliced ORFs). The number of ORFs between this two formats are different (BED has much fewer ORFs?) Is there a way to make all BED records to 12 columns?
Best wishes, Fengyuan
Dear Fengyuan,
GEDI is the framework that PRICE is built on, so yes (should include the PRICE version there as well...)!
Regarding BED6/BED12: This should not be the case anymore with 1.0.3!
Best, Florian
Dear Florian,
You are right, 1.0.3 generates BED12 now, thanks for the improvement.
As I mentioned, the numbers of ORFs in orfs.tsv and orfs.bed are different, for example:
cmd:
gedi -e ViewCIT -m bed -name 'd.getType()' orfs.cit >orfs.bed
orfs.tsv (47714):
ENSMUSG00000025915.14 ENSMUST00000166384.7_uORF_5 1+:9798143-9798246|9865130-9865186 1+:9798143-9798246|9865130-9865186 ACG uORF 0.78 0.47 0.031466 42.4 42.4
ENSMUSG00000025915.14 ENSMUST00000188625.1_ncRNA_4 1+:9827021-9827051 1+:9798231-9798246|9827012-9827051 CTG ncRNA 0.26 0.46 0.51739 56.0 56.0
ENSMUSG00000025915.14 ENSMUST00000168907.7_uORF_2 1+:9798245-9798246|9827012-9827047 1+:9798245-9798246|9827012-9827047 GTG uORF 0.11 0.30 0.071814 75.3 75.3
ENSMUSG00000025915.14 ENSMUST00000188625.1_ncRNA_7 1+:9827081-9827123 1+:9827069-9827123 CTG ncRNA 0.07 0.41 0.087234 32.1 32.1
ENSMUSG00000025915.14 ENSMUST00000168907.7_uORF_6 1+:9827081-9827119|9865130-9865149 1+:9827069-9827119|9865130-9865149 CTG uORF 0.07 0.41 0.29373 33.2 33.2
ENSMUSG00000025915.14 ENSMUST00000189050.1_ncRNA_9 1+:9865138-9865186 1+:9865138-9865186 ATG ncRNA 0.08 0.46 0.33833 17.6 17.6
ENSMUSG00000025915.14 ENSMUST00000189050.1_ncRNA_1 1+:9865252-9865348|9868382-9868466|9871629-9871702|9872257-9872337 1+:9865252-9865348|9868382-9868466|9871629-9871702|9872257-9872337 ATG ncRNA 0.57 0.53 1.0000 113.2 113.2
ENSMUSG00000025915.14 ENSMUST00000168907.7_CDS_0 1+:9865252-9865348|9868382-9868466|9871629-9871702|9872257-9872333|9877165-9877253|9879025-9879075|9880331-9880389|9881577-9881661|9881759-9881891|9884406-9884519|9884624-9884661|9885666-9885753|9886046-9886142|9888869-9889025|9891360-9891450|9898719-9898890 1+:9865252-9865348|9868382-9868466|9871629-9871702|9872257-9872333|9877165-9877253|9879025-9879075|9880331-9880389|9881577-9881661|9881759-9881891|9884406-9884519|9884624-9884661|9885666-9885753|9886046-9886142|9888869-9889025|9891360-9891450|9898719-9898890 ATG CDS 0.57 0.53 6.6474e-09 438.6 438.6
ENSMUSG00000025915.14 ENSMUST00000191338.1_ncRNA_3 1+:9884430-9884519|9884624-9884661|9885666-9885777 1+:9884430-9884519|9884624-9884661|9885666-9885777 AGG ncRNA 0.20 0.48 1.0000 72.6 72.6
ENSMUSG00000025915.14 ENSMUST00000191338.1_ncRNA_8 1+:9886064-9886181 1+:9886064-9886181 ATC ncRNA 0.11 0.26 1.0000 22.8 22.8
ENSMUSG00000056763.16 ENSMUST00000071087.11_Variant_0 1+:10038306-10038344|10045075-10045184|10047423-10047526|10058719-10058823|10064332-10064413|10065322-10065421|10066425-10066820|10074859-10074946|10077171-10077242|10082252-10082346|10083500-10083558|10085746-10085952|10088050-10088184|10088881-10088960|10090187-10090317|10095865-10096013|10104211-10104364|10108368-10108481|10112392-10112542|10112875-10113022|10113602-10113707|10116017-10116084|10116613-10116731|10126339-10126479|10127422-10127563|10129985-10130032|10132453-10132517|10133163-10133273|10134091-10134230|10135933-10136145 1+:10038306-10038344|10045075-10045184|10047423-10047526|10058719-10058823|10064332-10064413|10065322-10065421|10066425-10066820|10074859-10074946|10077171-10077242|10082252-10082346|10083500-10083558|10085746-10085952|10088050-10088184|10088881-10088960|10090187-10090317|10095865-10096013|10104211-10104364|10108368-10108481|10112392-10112542|10112875-10113022|10113602-10113707|10116017-10116084|10116613-10116731|10126339-10126479|10127422-10127563|10129985-10130032|10132453-10132517|10133163-10133273|10134091-10134230|10135933-10136145 TTG Variant 0.25 0.22 1.6777e-14 176.1 176.1
ENSMUSG00000056763.16 ENSMUST00000122156.7_CDS_1 1+:10045085-10045184|10047423-10047526|10058719-10058823|10059823-10059847|10064332-10064413|10065322-10065421|10066425-10066875 1+:10045085-10045184|10047423-10047526|10058719-10058823|10059823-10059847|10064332-10064413|10065322-10065421|10066425-10066875 ATG CDS 0.23 0.47 1.0000 68.1 68.1
ENSMUSG00000025925.14 ENSMUST00000188371.6_CDS_0 1+:15805677-15805957|15807640-15807736|15813040-15813162|15815885-15815972|15818933-15819083|15819650-15819763|15831506-15831566|15833400-15833489|15838181-15838288|15842920-15843082 1+:15805677-15805957|15807640-15807736|15813040-15813162|15815885-15815972|15818933-15819083|15819650-15819763|15831506-15831566|15833400-15833489|15838181-15838288|15842920-15843082 ATG CDS 0.66 0.54 0.0000 400.3 400.3
ENSMUSG00000025925.14 ENSMUST00000188371.6_iORF_1 1+:15805924-15805957|15807640-15807682 1+:15805924-15805957|15807640-15807682 AGG iORF 0.79 0.42 0.29004 11.8 11.8
ENSMUSG00000025940.6 ENSMUST00000176451.1_ncRNA_2 1+:16665313-16665505|16665707-16665731 1+:16665313-16665505|16665707-16665731 TTG ncRNA 0.16 0.36 0.93616 584.7 584.7
ENSMUSG00000025940.6 ENSMUST00000177501.1_Trunc_0 1+:16665313-16665505|16667664-16667773|16676967-16677425 1+:16665313-16665505|16667664-16667773|16676967-16677425 TTG Trunc 0.16 0.36 3.2460e-15 1119.2 1119.2
ENSMUSG00000025940.6 ENSMUST00000065373.5_Trunc_1 1+:16665313-16665505|16667667-16667773|16676967-16677425 1+:16665313-16665505|16667667-16667773|16676967-16677425 TTG Trunc 0.16 0.36 1.0000 1119.2 1119.2
ENSMUSG00000025940.6 ENSMUST00000065373.5_iORF_3 1+:16677136-16677163 1+:16677136-16677163 AGG iORF 0.08 0.36 0.44073 7.3 7.3
ENSMUSG00000025779.10 ENSMUST00000191260.1_ncRNA_2 1+:16688465-16688504 1+:16688465-16688504 ATG ncRNA 0.65 0.53 3.1100e-06 71.7 71.7
orfs.bed (12054):
1 9865252 9898890 CDS 0 + 0 0 0 16 96,84,73,76,88,50,58,84,132,113,37,87,96,156,90,171, 0,3130,6377,7005,11913,13773,15079,16325,16507,19154,19372,20414,20794,23617,26108,33467,
1 10038306 10136145 Variant 0 + 0 0 0 30 38,109,103,104,81,99,395,87,71,94,58,206,134,79,130,148,153,113,150,147,105,67,118,140,141,47,64,110,139,212, 0,6769,9117,20413,26026,27016,28119,36553,38865,43946,45194,47440,49744,50575,51881,57559,65905,70062,74086,74569,75296,77711,78307,88033,89116,91679,94147,94857,95785,97627,
1 15805677 15843082 CDS 0 + 0 0 0 10 280,96,122,87,150,113,60,89,107,162, 0,1963,7363,10208,13256,13973,25829,27723,32504,37243,
1 5083163 5084454 ncRNA 0 + 0 0 0 2 115,38, 0,1253,
1 5083498 5083618 uORF 0 + 0 0 0 1 120, 0,
1 5084456 5162165 Trunc 0 + 0 0 0 13 107,103,90,114,105,54,98,193,179,126,102,114,61, 0,4552,8906,11158,13572,16613,32933,39820,48627,51355,59293,65491,77648,
1 5143835 5149982 iORF 0 + 0 0 0 2 16,35, 0,6112,
1 6214901 6227997 uORF 0 + 0 0 0 2 56,58, 0,13038,
1 6230020 6274275 Trunc 0 + 0 0 0 22 53,127,171,203,427,171,185,187,82,62,104,127,1892,166,109,241,49,48,59,137,71,78, 0,3940,4208,8241,9931,14084,14774,15137,15404,15577,17685,18045,18255,30909,32813,33003,34521,35553,38677,40670,42680,44177,
1 7120434 7169882 Trunc 0 + 0 0 0 5 181,103,172,124,368, 0,27202,40457,42813,49080,
In orfs.tsv, there are 10 ORFs predicted in ENSMUSG00000025915.14, but only ENSMUST00000168907.7_CDS_0 appears in orfs.bed. Can you please let me know why there is a discrepancy?
Best wishes, Fengyuan
Dear Fengyuan,
sorry, I missed that question in the first place. The tsv contains all results (without filtering), the cit the FDR filtered list of orfs (see here).
Best, Florian
Dear Florian,
That makes sense. Thank you so much for your help.
Best wishes, Fengyuan
Hi all, I get following error while executing Bam2CIT function (the same applies for gedi -e Price), while working on sorted, indexed BAM file that is otherwise just fine for any other task.
2018-09-04 18:14:17.096 INFO Command: gedi -e Bam2CIT -p riboseq.cit kmarx_riboseqAligned.sortedByCoord.out.bam 2018-09-04 18:14:17.232 INFO Discovering classes in classpath 2018-09-04 18:14:17.316 INFO Preparing simple class references 2018-09-04 18:14:17.422 INFO Gedi 1.0.2 (JAR) startup \ IV-:1188365-1188395 [1] x1 192813 (56246.8/sec)Exception in thread "main" java.lang.IllegalArgumentException: Invalid reference index -1 at htsjdk.samtools.QueryInterval.(QueryInterval.java:24)
at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:504)
at gedi.region.bam.BamGenomicRegionStorage$BufferingGenomicRegionSpliterator.iterator(BamGenomicRegionStorage.java:992)
at gedi.region.bam.BamGenomicRegionStorage$BufferingGenomicRegionSpliterator.tryAdvance(BamGenomicRegionStorage.java:1037)
at gedi.util.functions.MappedSpliterator.tryAdvance(MappedSpliterator.java:54)
at gedi.util.functions.SpliteratorArraySpliterator.tryAdvance(SpliteratorArraySpliterator.java:62)
at gedi.util.functions.MappedSpliterator.tryAdvance(MappedSpliterator.java:54)
at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
at gedi.util.functions.EI$4.hasNext(EI.java:141)
at gedi.util.FunctorUtils$InitActionIterator.hasNext(FunctorUtils.java:679)
at gedi.util.FunctorUtils$PeekIterator.hasNext(FunctorUtils.java:636)
at gedi.util.FunctorUtils$SideEffectIterator.hasNext(FunctorUtils.java:717)
at gedi.util.FunctorUtils$HasNextActionIterator.hasNext(FunctorUtils.java:807)
at gedi.centeredDiskIntervalTree.CenteredDiskIntervalTreeStorage.fill(CenteredDiskIntervalTreeStorage.java:309)
at gedi.core.region.GenomicRegionStorage.fill(GenomicRegionStorage.java:183)
at executables.Bam2CIT.main(Bam2CIT.java:137)
2018-09-04 18:14:20.973 INFO Finished: gedi -e Bam2CIT -p riboseq.cit kmarx_riboseqAligned.sortedByCoord.out.bam
What might be the problem and how should I approach this? I have successfully worked with price on different organism, so installation, etc. is proper.
Best regards, Michał