Closed johann-petrak closed 7 years ago
Has there been any progress on this? The very latest head version of the software I just cloned still has this problem which is a really big problem: it means either s-space creates thousands of files in the /tmp directory or it fails with this exception.
I'll see if i can get this fixed in the next few days and push a new version to github. Thanks for the reminder!
Just had another try with df71a3cf323380e30550e2482ecc53bfba3801d7 and this is still a problem. The reason for this is that after building the semantic space, using any OOV (out of vocabulary word) for creating a document vector with DocumentVectorBuilder.buildVector internally will lookup the index for each word in the internal BasisMapping. The BasisMapping instance will return the next free index instead of -1 because it is not in readOnly mode. However there seems to be no API method to always make sure the Mapping gets switched to readOnly mode. The BasisMapping interface exposes the setReadOnly method, but the semantic space interface does not, nor is there a way to access the internal BasisMapping object. The easiest way to fix this is probably this: in DocumentBuilder.buildVector, before looping over the entries from the termCounts map, get the sspace.getWords set, then inside the loop, only try to get the vector if the word is in that set.
After commit f6038a3501e69c4ce6011de53a0aa2bf877710c2 (or potentially a later commit), code that worked just fine now throws and ArrayIndexOutOfBoundsException.
Here is the exception:
Here is a minimal groovy script which illustrates the problem:
Run this using
groovy -cp <neededJars> file.groovy
and this will either throw the exception, if run with the latest sspace jar on the classpath, or create a file in /tmp if run with sspace before commit f6038a35...