dmaticzka / GraphProt

GraphProt: modelling binding preferences of RNA-binding proteins
http://www.bioinf.uni-freiburg.de/Software/GraphProt/
MIT License
18 stars 3 forks source link

Core dump when predicting on 3'UTR sequences #5

Open marvin-jens opened 5 years ago

marvin-jens commented 5 years ago

Thank you for providing GraphProt! I am trying to predict TIA1 binding to a larger set of 3'UTRs and running into Core dumps with no output generated. Any help/advice on identifying the offending sequence would be much appreciated!

GraphProt.pl \
  -action=predict \
  -fasta=../split0.3utr.fa 
  -onlyseq 
  -model=../../GraphProt_models_CLIP/sequence/ICLIP_TIA1.slop15.train.model 
  -keep-tmp 
  -prefix=split0

produces an empty split0.predicions file (and somehow seems to clear the tmp directory with the core dump?. At least I can't find it.) I attach the sequences used:

split0.3utr.fa.zip

The model file is linked from the main page so I assume you have it? Using graphprot-1.1.7-h3445559_4 from bioconda on ubuntu 18.04 LTS. Thank you for your time. Best regards, -Marvin

Here's the output:

Using parameters:
R: 1
D: 4
bitsize: 14
epochs: 10
lambda: 0.001

using empty set of unknowns!
touch /scratch2/GraphProt/k562/TIA1/GraphProt_tmp-d7RIf4.test.unknowns.fa
( perl /home/mjens/miniconda2/libexec/graphprot//fastapl -p -1 -e '$head .= " 1";' < /scratch2/GraphProt/k562/TIA1/GraphProt_tmp-d7RIf4.test.positives.fa; \
  perl /home/mjens/miniconda2/libexec/graphprot//fastapl -p -1 -e '$head .= " -1";' < /scratch2/GraphProt/k562/TIA1/GraphProt_tmp-d7RIf4.test.negatives.fa; \
  perl /home/mjens/miniconda2/libexec/graphprot//fastapl -p -1 -e '$head .= " 0";' < /scratch2/GraphProt/k562/TIA1/GraphProt_tmp-d7RIf4.test.unknowns.fa ) > /scratch2/GraphProt/k562/TIA1/GraphProt_tmp-d7RIf4.test.fa
cp /scratch2/GraphProt/k562/TIA1/GraphProt_tmp-d7RIf4.param /scratch2/GraphProt/k562/TIA1/GraphProt_tmp-d7RIf4.test.param
perl /home/mjens/miniconda2/libexec/graphprot//fasta2shrep_gspan.pl --vp --seq-graph-t -nostr -stdout -fasta /scratch2/GraphProt/k562/TIA1/GraphProt_tmp-d7RIf4.test.fa | \
gzip > /scratch2/GraphProt/k562/TIA1/GraphProt_tmp-d7RIf4.test.gspan.gz; exit ${PIPESTATUS[0]}
perl /home/mjens/miniconda2/libexec/graphprot//fastapl -e '@ry = split(/\s/,$head); print $ry[-1], "\n"' < /scratch2/GraphProt/k562/TIA1/GraphProt_tmp-d7RIf4.test.fa > /scratch2/GraphProt/k562/TIA1/GraphProt_tmp-d7RIf4.test.affy
bash /home/mjens/miniconda2/libexec/graphprot//check_sync_gspan_class.sh /scratch2/GraphProt/k562/TIA1/GraphProt_tmp-d7RIf4.test.gspan.gz /scratch2/GraphProt/k562/TIA1/GraphProt_tmp-d7RIf4.test.affy && \
OMP_NUM_THREADS=1 /home/mjens/miniconda2/libexec/graphprot//EDeN -a FEATURE -i /scratch2/GraphProt/k562/TIA1/GraphProt_tmp-d7RIf4.test.gspan.gz -r 1 -d 4 -b 14 -g DIRECTED
OK: gspan and class in sync
----------------------------------------------------------------
EDeN (Explicit Decomposition with Neighborhoods) Vers. 0.4.2 (15 June 2013)
Author: Fabrizio Costa costa@informatik.uni-freiburg.de
----------------------------------------------------------------
Processing I/O file: /scratch2/GraphProt/k562/TIA1/GraphProt_tmp-d7RIf4.test.gspan.gz
.......terminate called without an active exception
make: *** [/home/mjens/miniconda2/libexec/graphprot//Makefile:266: /scratch2/GraphProt/k562/TIA1/GraphProt_tmp-d7RIf4.test.feature] Aborted (core dumped)
paste: /scratch2/GraphProt/k562/TIA1/GraphProt_tmp-d7RIf4.test.predictions_sgd: No such file or directory
GraphProt predictions written to file split0.predictions
marvin-jens commented 5 years ago

I bisected the fasta file and after 11 steps narrowed it down to a single sequence that triggers the error. I had hoped there would be something obvious like N's or a very long sequence or short. But that's not the case. Looks okay to me head-scratch

> ENST00000568252.1_1.UTR3
tgagtgatgtttcaggctggaagcggtgagccactgacagggatcagagaattccccacagaatttggcagtcacatgcatggatttagaaagacaaagttggaaatgaattgttgcagctaatttctccctcaagacattgtctacattctgaagctagatctggtggcagaggagactaaggagctcagtatttccacatagtaacaacaaacattctaaaaagaaacaagagaacaatcttacttacaacagcttcaaaaaataaaatatttagaaataagtttaaccaagaaggtgaaagacctgtacactgaaaaatgttaataacaaaaattatagaagacacaaataaattggaagatattctgtgttcatagattggaagaataatgttgctaaaatgtccatactaccccaaatgacttatagactcaaagcaatttctaacaaaattgtaatgtaattattcatagtaacagaaaaaaatataaaattaatgtggaaccacaacaaactctgaatagccaaaggaatcatgagtaagaagatgaaagctggaggcaccaaccacatgcatgatttaaaactacactacaaagcaatagtaattaaaacagtgtgatactggcatgaaaatagactcgttggctgggtgcagtggctcacgcctgtaatcccagcacgttgggagtctgaggcaggaagatcatgaggttaggagtttgagaccagcctgaccaacatggtgaaagcccgtctctacaaaaaatacaaaaattagccaggcatggtggcacatgccggtaatcccagctactcaggcaactgaggcaggagaattgcttgaacccgggaggcagaggctgcagtgagcctagattacaccactgcactcccgcctgggtgacagagcaagactctatctcaa
marvin-jens commented 5 years ago

LOL! If I change all letters to upper-case it no longer crashes. Alright, I'm glad because that's easy to fix. That being said, it would be great if this requirement were prominently placed in the help text or the README.md. (or just switch up all seqs to upper-case, internally?)

dmaticzka commented 5 years ago

Glad you found a solution! There used to be a check for all lowercase sequences, I have to look into that.

aishsk87 commented 4 years ago

@marvin-jens @dmaticzka weird thing is I ran your previous fasta file that you have attached (as it is ) and it worked for me however when I run my fasta file with and without changing all the sequences to upper case I get the same error no matter what. Would any body be able to help me with this? Thanks