SystemsGenetics / KINC

Knowledge Independent Network Construction
MIT License
11 stars 4 forks source link

Unknown segmentation fault at end of KINC extract #147

Open JohnHadish opened 4 years ago

JohnHadish commented 4 years ago

Segmentation error appeared when KINC extract nearly completed. Run conditions are included:

ERROR:

...
...
98% 1d9h55m21s  41m32s
99% 1d10h12m19s 20m43s
/var/spool/slurmd/job18799739/slurm_script: line 32: 289432 Segmentation fault      kinc run extract --emx "GEM-DAP-1-14_vs_Bartlett-v2_expression_matrix.-log-no.emx" --ccm "GEM-DAP-1-14_vs_Bartlett-v2_expression_matrix.-log-no-PAF.ccm" --cmx "GEM-DAP-1-14_vs_Bartlett-v2_expression_matrix.-log-no-PAF.cmx" --csm "GEM-DAP-1-14_vs_Bartlett-v2_expression_matrix.-log-no-PAF.csm" --format "tidy" --output "GEM-DAP-1-14_vs_Bartlett-v2_expression_matrix.-log-no-PAF.th${th}-p${p}-rsqr${r2}-gcn.tidy.txt" --mincorr $th --maxcorr 1 --filter-pvalue $p --filter-rsquare $r2

SLURM run sbatch (ran on Kamiak) NOTE: version 3.4.1 on Kamiak currently NOT congruent with KINC 3.4.1 release

#!/bin/sh
#SBATCH --partition=ficklin
#SBATCH --account=ficklin
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:tesla:1
#SBATCH --time=7-00:00:00
#SBATCH --job-name=07-extract
#SBATCH --output=logs/%x.log
#SBATCH --mail-user=john
#SBATCH --mail-type=ALL

module load gcc/7.3.0 openmpi/4.0.0 cuda/9.1.85 qt/5.10.1  blas/3.8.0 \
            gsl/2.4  statsLib/20190625 gcem/20190625 lapack/3.8.0 \
            ACE/3.2.0 openblas/0.3.0 KINC/3.4.1

p="1e-06"
r2="0.30"
th="0.0"

kinc run extract \
   --emx "GEM-DAP-1-14_vs_Bartlett-v2_expression_matrix.-log-no.emx" \
   --ccm "GEM-DAP-1-14_vs_Bartlett-v2_expression_matrix.-log-no-PAF.ccm" \
   --cmx "GEM-DAP-1-14_vs_Bartlett-v2_expression_matrix.-log-no-PAF.cmx" \
   --csm "GEM-DAP-1-14_vs_Bartlett-v2_expression_matrix.-log-no-PAF.csm" \
   --format "tidy" \
   --output "GEM-DAP-1-14_vs_Bartlett-v2_expression_matrix.-log-no-PAF.th${th}-p${p}-rsqr${r2}-gcn.tidy.txt" \
   --mincorr $th \
   --maxcorr 1 \
   --filter-pvalue $p \
   --filter-rsquare $r2

Unkown what to do in this situation, if error in script, KINC should fail upon start, not after day and a half.

JohnHadish commented 4 years ago

Note: This error can result in incomplete edge lines being written:

pycom976g00210  pycom13g11710   0.71396494  co  1   29  9999911111999991919999999999999999119119999911191999999111199999111119999911111 Ordinal 4.2523092e-07   0.72048342
pycom976g00210  pycom13g13730   0.68633795  co  1   29  9999911111999991919999999999999999119119999911191999999111199999111119999911111 Tissue__Peel    6.949067e-13    nan
pycom976g00210  pycom13g14080   -0.77906531 co  1   19  9999910100999990909999999999999999019089999911891999999111199999011119999911111 Tissue__Peel    1.4853823e-07   nan
pycom976g00210
JohnHadish commented 4 years ago

This should be labeled as top priority, as it effectively breaks KINC. Contact me for files which reproduce error.

spficklin commented 4 years ago

Given the large amount of time it took to do the extract I think it will be hard to reproduce this until we deal with issue #146.