erhard-lab / price

Improved Ribo-seq enables identification of cryptic translation events
10 stars 0 forks source link

Combining multiple gtf files #8

Closed TamaraO closed 5 years ago

TamaraO commented 6 years ago

Hi Florian,

I have two separate GTF files - one from GENCODE and another from de novo transcriptome assembly. I prepared indexes for both successfully. When I try to run start.bash script, it seems to run fine at first, but fails at MergeSams command:


login02:~/Ribo-seq/run/src/PRICE$ more price_bash.e4516445.2

adapter AGATCGGAAGAGCACACG
---
mRpm   million reads per minute
mNpm   million nucleotides per minute
mCps   million alignment cells per second
lint   total removed reads (per 10K), sum of columns to the left
25K reads per dot, 1M reads per line  seconds  mr mRpm mNpm mCps {error qc  low  len  NNN tabu nobc cflr  cfl lint   OK} per 10K
........................................   12   1  5.0  257   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12   2  5.0  255   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12   3  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12   4  5.0  257   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12   5  5.1  258   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12   6  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12   7  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12   8  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12   9  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  10  5.0  253   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  11  5.0  253   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  12  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  13  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  14  5.0  253   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  15  5.0  253   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  16  5.0  253   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  17  5.0  252   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  18  5.0  252   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  19  5.0  253   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  20  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  21  5.0  252   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  22  4.9  250   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  23  4.9  250   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  24  5.0  253   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  25  4.9  252   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  26  4.9  252   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  27  4.9  251   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  28  4.9  251   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  29  4.9  251   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  30  4.9  250   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  31  4.9  251   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  32  4.9  251   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  33  4.9  251   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  34  4.9  251   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  35  4.9  251   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  36  4.9  251   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  37  4.9  251   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  38  4.9  251   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  39  4.9  251   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  40  4.9  251   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  41  4.9  251   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  42  4.9  252   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  43  4.9  251   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  44  4.9  251   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  45  4.9  251   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  46  4.9  251   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  47  4.9  250   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  48  5.0  253   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  49  5.0  255   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  50  5.0  255   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  51  5.0  255   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  52  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  53  5.0  255   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  54  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  55  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  56  5.0  255   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  57  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  58  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  59  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  60  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  61  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  62  5.0  255   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  63  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  64  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  65  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  66  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  67  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  68  5.0  257   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  69  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  70  5.0  257   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  71  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  72  5.0  255   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  73  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  74  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  75  5.0  257   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  76  5.1  258   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  77  5.0  257   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  78  5.0  257   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  79  5.0  257   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  80  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  81  5.0  257   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  82  5.0  257   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  83  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  84  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  85  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  86  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  87  5.0  257   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  88  5.0  257   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  89  5.1  258   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  90  5.0  257   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  91  5.0  257   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  92  5.1  258   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  93  5.0  257   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  94  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  95  5.0  257   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  96  5.1  258   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  97  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  98  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12  99  5.0  255   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 100  5.0  255   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 101  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 102  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 103  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 104  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 105  5.0  253   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 106  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 107  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 108  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 109  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 110  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 111  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 112  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 113  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 114  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 115  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 116  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 117  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 118  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 119  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 120  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 121  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 122  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 123  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 124  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 125  5.0  254   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 126  5.0  255   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 127  5.0  255   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 128  5.0  253   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 129  4.9  252   81    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 130  5.0  255   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 131  5.0  255   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 132  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 133  5.0  255   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 134  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 135  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 136  5.0  255   82    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 137  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 138  5.0  257   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 139  5.0  257   83    0    0    0    0    0    0    0    0    0    0 10000
........................................   12 140  5.0  256   83    0    0    0    0    0    0    0    0    0    0 10000
..............
[reaper] check 0 errors, 0 reads truncated, 140374449 clean, 0 lint, 140374449 total
2018-07-26 17:28:32.345 INFO Command: gedi -e FastqFilter -D -ld B1_2.readlengths.tsv -min 18 B1_2.lane.clean
2018-07-26 17:28:32.873 INFO Discovering classes in classpath
2018-07-26 17:28:33.082 INFO Preparing simple class references
2018-07-26 17:28:33.245 INFO Gedi 1.0.2 (JAR) startup
# reads processed: 139710809
# reads with at least one reported alignment: 120758322 (86.43%)
# reads that failed to align: 18952487 (13.57%)
Reported 246299943 alignments to 1 output stream(s)
# reads processed: 18952487
# reads with at least one reported alignment: 17487993 (92.27%)
# reads that failed to align: 1430030 (7.55%)
# reads with alignments suppressed due to -m: 34464 (0.18%)
Reported 89322897 alignments to 1 output stream(s)
[bam_sort_core] merging from 8 files and 8 in-memory blocks...
# reads processed: 18952487
# reads with at least one reported alignment: 16074662 (84.82%)
# reads that failed to align: 2866322 (15.12%)
# reads with alignments suppressed due to -m: 11503 (0.06%)
Reported 85785894 alignments to 1 output stream(s)
[bam_sort_core] merging from 8 files and 8 in-memory blocks...
# reads processed: 18952487
# reads with at least one reported alignment: 17487993 (92.27%)
# reads that failed to align: 1430030 (7.55%)
# reads with alignments suppressed due to -m: 34464 (0.18%)
Reported 89322897 alignments to 1 output stream(s)
[bam_sort_core] merging from 8 files and 8 in-memory blocks...
# reads processed: 18952487
# reads with at least one reported alignment: 1685287 (8.89%)
# reads that failed to align: 17243144 (90.98%)
# reads with alignments suppressed due to -m: 24056 (0.13%)
Reported 5602146 alignments to 1 output stream(s)
[bam_sort_core] merging from 0 files and 8 in-memory blocks...
2018-07-26 19:59:06.135 INFO Command: gedi -e MergeSam -D -genomic mi_unannot mi_unannot human_g1k_v37 human_g1k_v37 -t /ahg/regevdata/projects/Ribo-seq/run/raw/PRICE_20180726/./parameters_20180726/s
cripts/B1_2.prio.csv -prio /ahg/regevdata/projects/Ribo-seq/run/raw/PRICE_20180726/./parameters_20180726/scripts/B1_2.prio.oml -chrM -o B1_2.cit
2018-07-26 19:59:06.812 INFO Discovering classes in classpath
2018-07-26 19:59:06.962 INFO Preparing simple class references
2018-07-26 19:59:07.076 INFO Gedi 1.0.2 (JAR) startup
2018-07-26 19:59:08.224 INFO Reading oml /home/unix/tamarao/.gedi/genomic/human_g1k_v37.oml
2018-07-26 19:59:08.244 INFO Done reading oml /home/unix/tamarao/.gedi/genomic/human_g1k_v37.oml
2018-07-26 19:59:08.337 INFO Reading oml /home/unix/tamarao/.gedi/genomic/mi_unannot.oml
2018-07-26 19:59:08.365 INFO Done reading oml /home/unix/tamarao/.gedi/genomic/mi_unannot.oml
An error occurred: Genomes are not disjoint!
java.lang.RuntimeException: Genomes are not disjoint!
    at gedi.core.genomic.Genomic.merge(Genomic.java:92)
    at gedi.core.genomic.Genomic.merge(Genomic.java:424)
    at gedi.core.genomic.Genomic.merge(Genomic.java:385)
    at executables.MergeSam.start(MergeSam.java:220)
    at executables.MergeSam.main(MergeSam.java:84)
2018-07-26 19:59:10.158 INFO Finished: gedi -e MergeSam -D -genomic mi_unannot mi_unannot human_g1k_v37 human_g1k_v37 -t /ahg/regevdata/projects/Ribo-seq/run/raw/PRICE_20180726/./parameters_20180726/
scripts/B1_2.prio.csv -prio /ahg/regevdata/projects/Ribo-seq/run/raw/PRICE_20180726/./parameters_20180726/scripts/B1_2.prio.oml -chrM -o B1_2.cit
2018-07-26 19:59:10.367 INFO Command: gedi Nashorn -e println(EI.wrap(DynamicObject.parseJson(FileUtils.readAllText(new File('B1_2.cit.metadata.json'))).getEntry('conditions').asArray()).mapToDouble(fu
nction(d) d.getEntry('total').asDouble()).sum())
2018-07-26 19:59:11.064 INFO Discovering classes in classpath
2018-07-26 19:59:11.260 INFO Preparing simple class references
2018-07-26 19:59:11.402 INFO Gedi 1.0.2 (JAR) startup
Exception in thread "main" java.lang.RuntimeException: Could not execute JS, dumb.js saved!
    at gedi.util.nashorn.JS.eval(JS.java:369)
    at Nashorn.main(Nashorn.java:47)
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: B1_2.cit.metadata.json (No such file or directory)
    at jdk.nashorn.internal.runtime.ScriptRuntime.apply(ScriptRuntime.java:397)
    at jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:449)
    at jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:406)
    at jdk.nashorn.api.scripting.NashornScriptEngine.evalImpl(NashornScriptEngine.java:402)
    at jdk.nashorn.api.scripting.NashornScriptEngine.eval(NashornScriptEngine.java:155)
    at javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:264)
    at gedi.util.nashorn.JS.eval(JS.java:356)
    ... 1 more
Caused by: java.io.FileNotFoundException: B1_2.cit.metadata.json (No such file or directory)
    at java.io.FileInputStream.open0(Native Method)
    at java.io.FileInputStream.open(FileInputStream.java:195)
    at java.io.FileInputStream.<init>(FileInputStream.java:138)
    at java.io.FileReader.<init>(FileReader.java:72)
    at gedi.util.FileUtils.readAllText(FileUtils.java:127)
    at jdk.nashorn.internal.scripts.Script$5$\^eval\_.:program(<eval>:1)
    at jdk.nashorn.internal.runtime.ScriptFunctionData.invoke(ScriptFunctionData.java:637)
    at jdk.nashorn.internal.runtime.ScriptFunction.invoke(ScriptFunction.java:494)
    at jdk.nashorn.internal.runtime.ScriptRuntime.apply(ScriptRuntime.java:393)
    ... 7 more
There were some errors, did not delete temp directory!
Warning message:
Removed 1 rows containing missing values (position_stack).

What does the error An error occurred: Genomes are not disjoint! mean? How can I go about using two different references?

Just in case, my parameters.json file looks like this:

{
    "datasets": [
        {"fastq": "/ahg/regevdata/projects/Ribo-seq/run/raw/B1_2/B1_2.fastq.gz", "name": "B1_2"},
    ],
    "references": {"human_g1k_v37": "both", "mi_unannot":"both", "rRNA_tRNA": "rRNA"},
    "adapter":  "AGATCGGAAGAGCACACGTCT"
}

Thanks a lot!

Tamara

florianerhard commented 6 years ago

Dear Tamara,

I overlooked this thread, my apologies for that! The problem is actually hidden in the error message ("Genomes are not disjoint"). A genome is a set of reference sequences (e.g. chromosomes) with annotations (e.g., genes). If you supply two genomes, the reference sequences have to be disjoint (which makes e.g. sense for a host and virus genome). In your case, you must combine the the gtfs (cat), and create a genome index for the human fasta and your combined gtf!

Best, Florian