Graph-Regularized Annotation via Semi-Supervised learning
6 stars 2 forks source link

Negative edge weights #2

Open katsikora opened 7 years ago

katsikora commented 7 years ago

Dear developers,

I encounter the following error when running GRASS:

Exception in thread "main" java.lang.RuntimeException: Non-positive weighted edge:>>136_Blood_comp125844_c2_seq5-->136_Blood_comp125844_c2_seq5<< -1.1

As far as I can tell this happens after running blast, when calling junto. That problematic edge is self-directed (same source and target sequence). The contents of my run directory after program exit are:

total 102M drwxr-xr-x 1 sikora bioinfo 36 Jan 11 12:56 AS -rw-r----- 1 sikora bioinfo 73M Jan 11 13:01 AS.TSdb drwxr-xr-x 1 sikora bioinfo 36 Jan 11 12:56 TS -rw-r----- 1 sikora bioinfo 0 Jan 10 15:23 TS.ASdb -rw-r----- 1 sikora bioinfo 7.1M Jan 11 13:01 grassGraph.txt -rw-r----- 1 sikora bioinfo 288 Jan 11 12:54 junto.config -rw-r----- 1 sikora bioinfo 3.3M Jan 11 12:56 mag.clust -rw-r----- 1 sikora bioinfo 7.1M Jan 11 12:56 -rw-r----- 1 sikora bioinfo 4.8M Jan 11 12:56 mag.flat.clust -rw-r----- 1 sikora bioinfo 7.1M Jan 11 12:55 -rw-r----- 1 sikora bioinfo 0 Jan 10 15:28 seed.txt -rw-r----- 1 sikora bioinfo 0 Jan 10 15:28 seedLabels.txt -rw-r----- 1 sikora bioinfo 88 Jan 11 12:56 stats.json

The TS.ASdb is empty. Curiously, all values in the grassGraph.txt are equal 1.1:

awk '{ print $3 }' grassGraph.txt | sort | uniq

Would you have any idea what might be the problem?

Best regards,


laraib85 commented 7 years ago


Can you print the values in ""? It seems the issue is most likely at the level of clustering contigs (done using RapClust).

Regards, Laraib

katsikora commented 7 years ago

Hello Laraib,

awk '{ print $3 }' | sort | uniq 1.1

It's the same as in grassGraph.txt, then.

Thanks for looking into it,



laraib85 commented 7 years ago

Since RapClust directly uses the results obtained from Salmon/Sailfish, I'm guessing the issue is caused due to some problem at the level of read mapping. Going over the equivalence classes dumped by Salmon/Sailfish should help figure that out.

Let us know if you need help with that.


katsikora commented 7 years ago


so I looked into the equivalence classes of 1 of the multiple datasets I want to analyze.

If I get this right, then all the classes in that dataset have only 1 member transcript: awk '( NR>120263 )' eq_classes.txt | awk '{ print $1 }' | sort | uniq

I ran Salmon (v. 0.7.2) as: salmon quant -i $REFDIR/161221salmon.quasi -l ISR -1 <(zcat ${R1fq[$i]} ) -2 <(zcat ${R2fq[$i]} ) -o $runDir -p 8 --maxOcc 20 --maxReadOcc 1 --coverage 0.90 --dumpEq .

Do you think any of the parameters might cause this behaviour?



rob-p commented 7 years ago

Hi Katarzyna,

Can you tell us what the mapping rate of Salmon is? In the quantification directory of one of your samples, there is a directory called aux_info, and within this directory, a file called meta_info.json. Can you post the contents of that file? That will help us determine further what might be going on.

Best, Rob

katsikora commented 7 years ago

Hi Rob,

here goes the file content:

{ "salmon_version": "0.7.2", "samp_type": "none", "num_libraries": 1, "library_types": [ "ISR" ], "frag_dist_length": 1001, "seq_bias_correct": false, "gc_bias_correct": false, "num_bias_bins": 4096, "mapping_type": "mapping", "num_targets": 120261, "num_bootstraps": 0, "num_processed": 103236790, "num_mapped": 42745806, "percent_mapped": 41.405593877918903, "call": "quant", "start_time": "Tue Jan 10 10:51:56 2017" }

Thanks for your help,



katsikora commented 7 years ago

Dear Laraib,

I have rerun GRASS on Salmon equivalence classes obtained with near-default parameters. I now get a number of transcripts per class, also the grassGraph.txt file shows a distribution of edge values.

head grassGraph.txt 131_Gills_comp118255_c0_seq1 131_Gills_comp118255_c0_seq1 1.1 145_Blood_comp87177_c6_seq6 145_Blood_comp87177_c6_seq6 1.1 138_Blood_comp131799_c1_seq1 138_Blood_comp131799_c1_seq1 1.1 131_Blood_comp119291_c0_seq1 136_Gills_comp98921_c0_seq1 0.0510510510511 145_Typhlosole_comp85224_c1_seq2 145_Typhlosole_comp85224_c1_seq2 1.1 133_Typhlosole_comp103582_c0_seq1 133_Typhlosole_comp103582_c0_seq1 1.1 131_Blood_comp135226_c0_seq1 131_Blood_comp135226_c0_seq1 1.1 131_Gills_comp165313_c0_seq1 131_Gills_comp165313_c0_seq1 1.1 130_Kidney_comp52583_c0_seq1 136_Blood_comp120067_c1_seq2 0.838095238095 140_Kidney_comp68393_c4_seq2 142_Kidney_comp67810_c0_seq1 0.97558685446

Still, I get the same error from junto as before:

Exception in thread "main" java.lang.RuntimeException: Non-positive weighted edge:>>136_Blood_comp125844_c2_seq5-->136_Blood_comp125844_c2_seq5<< -1.1 at upenn.junto.algorithm.Adsorption$$anonfun$run$1$$anonfun$apply$mcVI$sp$1$$anonfun$apply$1.apply(Adsorption.scala:179) at upenn.junto.algorithm.Adsorption$$anonfun$run$1$$anonfun$apply$mcVI$sp$1$$anonfun$apply$1.apply(Adsorption.scala:169) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at upenn.junto.algorithm.Adsorption$$anonfun$run$1$$anonfun$apply$mcVI$sp$1.apply(Adsorption.scala:169) at upenn.junto.algorithm.Adsorption$$anonfun$run$1$$anonfun$apply$mcVI$sp$1.apply(Adsorption.scala:162) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at upenn.junto.algorithm.Adsorption$$anonfun$run$1.apply$mcVI$sp(Adsorption.scala:162) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at at$.apply(Junto.scala:72) at$.apply(Junto.scala:121) at$.main(Junto.scala:132) at

Is there something in the treatment of self-directed edges that may be causing this?

Best regards,


laraib85 commented 7 years ago


I am not sure why you are getting this error. We haven't done anything for treating self-directed edges differently yet. Can you share your GRASS graph file (grassGraph.txt), the seed file (seed.txt) and the junto config file ("junto.config")? I will try to reproduce the error and see what's going on.

Also, how did you install junto? Did you git clone and compile it?

Thanks, Laraib

katsikora commented 7 years ago


Hi Laraib,

attached is grassGraph.txt, junto.config pasted below, but seed.txt is empty. Let me know if you can make any sense out of this,



cat junto.config seed_file = /data/processing3/sikora/sikora/Trinity.grass/seed.txt output_file = /data/processing3/sikora/sikora/Trinity.grass/tempOutput graph_file = /data/processing3/sikora/sikora/Trinity.grass/grassGraph.txt data_format = edge_factored iters = 1 prune_threshold = 0 algo = adsorption