COMBINE-lab / GRASS

Graph-Regularized Annotation via Semi-Supervised learning
Other
6 stars 2 forks source link

Negative edge weights #2

Open katsikora opened 7 years ago

katsikora commented 7 years ago

Dear developers,

I encounter the following error when running GRASS:

Exception in thread "main" java.lang.RuntimeException: Non-positive weighted edge:>>136_Blood_comp125844_c2_seq5-->136_Blood_comp125844_c2_seq5<< -1.1

As far as I can tell this happens after running blast, when calling junto. That problematic edge is self-directed (same source and target sequence). The contents of my run directory after program exit are:

total 102M drwxr-xr-x 1 sikora bioinfo 36 Jan 11 12:56 AS -rw-r----- 1 sikora bioinfo 73M Jan 11 13:01 AS.TSdb drwxr-xr-x 1 sikora bioinfo 36 Jan 11 12:56 TS -rw-r----- 1 sikora bioinfo 0 Jan 10 15:23 TS.ASdb -rw-r----- 1 sikora bioinfo 7.1M Jan 11 13:01 grassGraph.txt -rw-r----- 1 sikora bioinfo 288 Jan 11 12:54 junto.config -rw-r----- 1 sikora bioinfo 3.3M Jan 11 12:56 mag.clust -rw-r----- 1 sikora bioinfo 7.1M Jan 11 12:56 mag.filt.net -rw-r----- 1 sikora bioinfo 4.8M Jan 11 12:56 mag.flat.clust -rw-r----- 1 sikora bioinfo 7.1M Jan 11 12:55 mag.net -rw-r----- 1 sikora bioinfo 0 Jan 10 15:28 seed.txt -rw-r----- 1 sikora bioinfo 0 Jan 10 15:28 seedLabels.txt -rw-r----- 1 sikora bioinfo 88 Jan 11 12:56 stats.json

The TS.ASdb is empty. Curiously, all values in the grassGraph.txt are equal 1.1:

awk '{ print $3 }' grassGraph.txt | sort | uniq
1.1

Would you have any idea what might be the problem?

Best regards,

Katarzyna

laraib85 commented 7 years ago

Hi,

Can you print the values in "mag.filt.net"? It seems the issue is most likely at the level of clustering contigs (done using RapClust).

Regards, Laraib

katsikora commented 7 years ago

Hello Laraib,

awk '{ print $3 }' mag.filt.net | sort | uniq 1.1

It's the same as in grassGraph.txt, then.

Thanks for looking into it,

Best,

Katarzyna

laraib85 commented 7 years ago

Since RapClust directly uses the results obtained from Salmon/Sailfish, I'm guessing the issue is caused due to some problem at the level of read mapping. Going over the equivalence classes dumped by Salmon/Sailfish should help figure that out.

Let us know if you need help with that.

-Laraib

katsikora commented 7 years ago

Hi,

so I looked into the equivalence classes of 1 of the multiple datasets I want to analyze.

If I get this right, then all the classes in that dataset have only 1 member transcript: awk '( NR>120263 )' eq_classes.txt | awk '{ print $1 }' | sort | uniq
1

I ran Salmon (v. 0.7.2) as: salmon quant -i $REFDIR/161221salmon.quasi -l ISR -1 <(zcat ${R1fq[$i]} ) -2 <(zcat ${R2fq[$i]} ) -o $runDir -p 8 --maxOcc 20 --maxReadOcc 1 --coverage 0.90 --dumpEq .

Do you think any of the parameters might cause this behaviour?

Best,

Katarzyna

rob-p commented 7 years ago

Hi Katarzyna,

Can you tell us what the mapping rate of Salmon is? In the quantification directory of one of your samples, there is a directory called aux_info, and within this directory, a file called meta_info.json. Can you post the contents of that file? That will help us determine further what might be going on.

Best, Rob

katsikora commented 7 years ago

Hi Rob,

here goes the file content:

{ "salmon_version": "0.7.2", "samp_type": "none", "num_libraries": 1, "library_types": [ "ISR" ], "frag_dist_length": 1001, "seq_bias_correct": false, "gc_bias_correct": false, "num_bias_bins": 4096, "mapping_type": "mapping", "num_targets": 120261, "num_bootstraps": 0, "num_processed": 103236790, "num_mapped": 42745806, "percent_mapped": 41.405593877918903, "call": "quant", "start_time": "Tue Jan 10 10:51:56 2017" }

Thanks for your help,

Best,

Katarzyna

katsikora commented 7 years ago

Dear Laraib,

I have rerun GRASS on Salmon equivalence classes obtained with near-default parameters. I now get a number of transcripts per class, also the grassGraph.txt file shows a distribution of edge values.

head grassGraph.txt 131_Gills_comp118255_c0_seq1 131_Gills_comp118255_c0_seq1 1.1 145_Blood_comp87177_c6_seq6 145_Blood_comp87177_c6_seq6 1.1 138_Blood_comp131799_c1_seq1 138_Blood_comp131799_c1_seq1 1.1 131_Blood_comp119291_c0_seq1 136_Gills_comp98921_c0_seq1 0.0510510510511 145_Typhlosole_comp85224_c1_seq2 145_Typhlosole_comp85224_c1_seq2 1.1 133_Typhlosole_comp103582_c0_seq1 133_Typhlosole_comp103582_c0_seq1 1.1 131_Blood_comp135226_c0_seq1 131_Blood_comp135226_c0_seq1 1.1 131_Gills_comp165313_c0_seq1 131_Gills_comp165313_c0_seq1 1.1 130_Kidney_comp52583_c0_seq1 136_Blood_comp120067_c1_seq2 0.838095238095 140_Kidney_comp68393_c4_seq2 142_Kidney_comp67810_c0_seq1 0.97558685446

Still, I get the same error from junto as before:

Exception in thread "main" java.lang.RuntimeException: Non-positive weighted edge:>>136_Blood_comp125844_c2_seq5-->136_Blood_comp125844_c2_seq5<< -1.1 at upenn.junto.algorithm.Adsorption$$anonfun$run$1$$anonfun$apply$mcVI$sp$1$$anonfun$apply$1.apply(Adsorption.scala:179) at upenn.junto.algorithm.Adsorption$$anonfun$run$1$$anonfun$apply$mcVI$sp$1$$anonfun$apply$1.apply(Adsorption.scala:169) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at upenn.junto.algorithm.Adsorption$$anonfun$run$1$$anonfun$apply$mcVI$sp$1.apply(Adsorption.scala:169) at upenn.junto.algorithm.Adsorption$$anonfun$run$1$$anonfun$apply$mcVI$sp$1.apply(Adsorption.scala:162) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at upenn.junto.algorithm.Adsorption$$anonfun$run$1.apply$mcVI$sp(Adsorption.scala:162) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at upenn.junto.algorithm.Adsorption.run(Adsorption.scala:155) at upenn.junto.app.JuntoRunner$.apply(Junto.scala:72) at upenn.junto.app.JuntoConfigRunner$.apply(Junto.scala:121) at upenn.junto.app.JuntoConfigRunner$.main(Junto.scala:132) at upenn.junto.app.JuntoConfigRunner.main(Junto.scala)

Is there something in the treatment of self-directed edges that may be causing this?

Best regards,

Katarzyna

laraib85 commented 7 years ago

Hi,

I am not sure why you are getting this error. We haven't done anything for treating self-directed edges differently yet. Can you share your GRASS graph file (grassGraph.txt), the seed file (seed.txt) and the junto config file ("junto.config")? I will try to reproduce the error and see what's going on.

Also, how did you install junto? Did you git clone and compile it?

Thanks, Laraib

katsikora commented 7 years ago

grassGraph.txt

Hi Laraib,

attached is grassGraph.txt, junto.config pasted below, but seed.txt is empty. Let me know if you can make any sense out of this,

Best,

Katarzyna

cat junto.config seed_file = /data/processing3/sikora/sikora/Trinity.grass/seed.txt output_file = /data/processing3/sikora/sikora/Trinity.grass/tempOutput graph_file = /data/processing3/sikora/sikora/Trinity.grass/grassGraph.txt data_format = edge_factored iters = 1 prune_threshold = 0 algo = adsorption