BIDS-collaborative / destress

Helping @peparedes with text analysis of livejournal data
ISC License
7 stars 2 forks source link

Problem loading SBMat #34

Closed lambdaloop closed 9 years ago

lambdaloop commented 9 years ago

So I am saving the sentences we have as SBMat files. However, it seems I can't load most of those (although some load).

Here is a typical error:

scala> loadSBMat("/var/local/destress/featurized_sent/data9_sent.sbmat")
java.lang.StringIndexOutOfBoundsException: String index out of range: 167
  at java.lang.String.substring(String.java:1907)
  at BIDMat.SBMat$$anonfun$toString$1.apply(SBMat.scala:93)
  at BIDMat.SBMat$$anonfun$toString$1.apply(SBMat.scala:92)
  at scala.collection.LinearSeqOptimized$class.forall(LinearSeqOptimized.scala:69)
  at scala.collection.immutable.List.forall(List.scala:83)
  at scala.collection.generic.TraversableForwarder$class.forall(TraversableForwarder.scala:40)
  at scala.collection.mutable.ListBuffer.forall(ListBuffer.scala:45)
  at BIDMat.SBMat.toString(SBMat.scala:92)
  at scala.runtime.ScalaRunTime$.scala$runtime$ScalaRunTime$$inner$1(ScalaRunTime.scala:332)
  at scala.runtime.ScalaRunTime$.stringOf(ScalaRunTime.scala:337)
  at scala.runtime.ScalaRunTime$.replStringOf(ScalaRunTime.scala:345)
  at .$print$lzycompute(<console>:10)
  at .$print(<console>:6)
  at $print(<console>)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:739)
  at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:986)
  at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:593)
  at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:592)
  at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
  at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
  at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:592)
  at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:524)
  at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:520)
  at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:754)
  at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:799)
  at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:666)
  at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:433)
  at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:450)
  at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:868)
  at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:854)
  at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:854)
  at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:95)
  at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:854)
  at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:74)
  at scala.tools.nsc.MainGenericRunner.run$1(MainGenericRunner.scala:87)
  at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:98)
  at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:103)
  at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)

Whereas data10_sent.sbmat loads fine:

scala> loadSBMat("/var/local/destress/featurized_sent/data10_sent.sbmat")
res41: BIDMat.SBMat =
it is way too freakin early to be awake .
but my sister needed to go christmas shopping this morning .
so she called me at : to come babysit since her husband went deer hunting at : this morning .
right now i am awake , but i don t for how much longer .
my brother in law , i hope , will be back around : am .
then i will go home and go back to bed in my bed .
nothing much to talk about today .
i subbed again today , it was a last minute thing .
and then i sub again tomorrow .
i think this is starting things off right for this month .
cleaned house today for my brother in law s parents .
i think i am getting faster at that .
...

Does anyone know how to fix this?

Tagging anyone who might know: @coryschillaci @DanielTakeshi @anasrferreira @davclark

coryschillaci commented 9 years ago

Where is the code you use to generate these?

On Thu, Apr 16, 2015 at 8:02 PM, Pierre Karashchuk <notifications@github.com

wrote:

So I am saving the sentences we have as SBMat files. However, it seems I can't load most of those (although some load).

Here is a typical error:

scala> loadSBMat("/var/local/destress/featurized_sent/data9_sent.sbmat") java.lang.StringIndexOutOfBoundsException: String index out of range: 167 at java.lang.String.substring(String.java:1907) at BIDMat.SBMat$$anonfun$toString$1.apply(SBMat.scala:93) at BIDMat.SBMat$$anonfun$toString$1.apply(SBMat.scala:92) at scala.collection.LinearSeqOptimized$class.forall(LinearSeqOptimized.scala:69) at scala.collection.immutable.List.forall(List.scala:83) at scala.collection.generic.TraversableForwarder$class.forall(TraversableForwarder.scala:40) at scala.collection.mutable.ListBuffer.forall(ListBuffer.scala:45) at BIDMat.SBMat.toString(SBMat.scala:92) at scala.runtime.ScalaRunTime$.scala$runtime$ScalaRunTime$$inner$1(ScalaRunTime.scala:332) at scala.runtime.ScalaRunTime$.stringOf(ScalaRunTime.scala:337) at scala.runtime.ScalaRunTime$.replStringOf(ScalaRunTime.scala:345) at .$print$lzycompute(:10) at .$print(:6) at $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:739) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:986) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:593) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:592) at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19) at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:592) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:524) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:520) at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:754) at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:799) at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:666) at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:433) at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:450) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:868) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:854) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:854) at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:95) at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:854) at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:74) at scala.tools.nsc.MainGenericRunner.run$1(MainGenericRunner.scala:87) at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:98) at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:103) at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)

Whereas data10_sent.sbmat loads fine:

scala> loadSBMat("/var/local/destress/featurized_sent/data10_sent.sbmat") res41: BIDMat.SBMat = it is way too freakin early to be awake . but my sister needed to go christmas shopping this morning . so she called me at : to come babysit since her husband went deer hunting at : this morning . right now i am awake , but i don t for how much longer . my brother in law , i hope , will be back around : am . then i will go home and go back to bed in my bed . nothing much to talk about today . i subbed again today , it was a last minute thing . and then i sub again tomorrow . i think this is starting things off right for this month . cleaned house today for my brother in law s parents . i think i am getting faster at that . ...

Does anyone know how to fix this?

Tagging anyone who might know: @coryschillaci https://github.com/coryschillaci @DanielTakeshi https://github.com/DanielTakeshi @anasrferreira https://github.com/anasrferreira @davclark https://github.com/davclark

— Reply to this email directly or view it on GitHub https://github.com/berkeley-dsc/destress/issues/34.

DanielTakeshi commented 9 years ago

I checked this directory and it seems like all files except the .imats have the .lz4 extension. Did you do something before that to arrive at dataXY_sent.sbmat without the lz4?

For what it's worth, I tried using loadSBMat("data9_sbmat.lz4") and that failed with an error in the type field. I will look at this again tomorrow morning if it hasn't been resolved already.

lambdaloop commented 9 years ago

I ended up doing a different thing, although I think the bug I encountered is still there. I used to have sentences saved in a CSMat which I convert to an SBMat and save. I tried with and without lz4 extension and got the issue above in both cases.

I ended up changing the code so that I only save integers corresponding to the words, and that seems to work alright.

I also deleted the old files and didn't save the old code, since I forgot I posted this issue...

I'll close it for now, but tell me if you want me to reproduce it to fix BIDMat.

davclark commented 9 years ago

It would be great if you could make a self-contained example that demonstrates the problem. But it should be submitted on the BIDMat issue tracker!