dwslab / jRDF2Vec

A high-performance Java Implementation of RDF2Vec
MIT License
39 stars 5 forks source link

how to use wang2vec output #95

Closed WarisBunglawala closed 2 years ago

WarisBunglawala commented 2 years ago

while using output text file from wang2vec in jrdf2vec for the conversion to kv format it shows this in the out message

Using server port: 1808 01 Mar 2022 12:49:35 INFO [main] (KvConverter.java:20) - Recognized txt format. Will convert to w2v and then to kv. 01 Mar 2022 12:49:47 ERROR [main] (Util.java:172) - Inconsistency in Dimensionality! 01 Mar 2022 12:49:47 ERROR [main] (Util.java:172) - Inconsistency in Dimensionality!

janothan commented 2 years ago

Hi, this looks like a bug. Could you post the java command you used so that I can further analyze the issue?

WarisBunglawala commented 2 years ago

first i used jrdf2vec for generating walks using nohup java -Xmx50g -jar jrdf2vec-1.2-SNAPSHOT.jar -graph <ttl file location> -onlyWalks &

then as required by wang2vec i have to convert all gz walks files to single uncompressed file so i used nohup java -jar jrdf2vec-1.1-SNAPSHOT.jar -mergeWalks -walkDirectory <walk dir> -o MergedAll

so after that, i was having a file named MergedAll and i needed ordered RDF2vec so i trained it using wang2vec nohup ./word2vec -train <MergedAll file location> -output <output file> -type 3 -size 100 -threads 1 -min-count 0 -cap 1 &

so by running this i had wangTrained.txt file and then i tried to convert it to the kv format for further use using nohup java -jar jrdf2vec-1.2-SNAPSHOT.jar -convertToKv <wangTrained.txt file> <newfile.kv> &

so when i check the .out file generated for the above command it shows the error message shown above however the command did generate w2v file but did not generate kv file as mentioned like Recognized txt format. Will convert to w2v and then to kv.

i dont have deep knowledge in this area but i do think that it is due to the parameters i used while training using wang2vec so currently i am trying to train again using command nohup ./word2vec -train <MergedAll file location> -output <output file> -type 3 -size 200 -threads 4 -min-count 1 -cap 1 & and i will let you know how that works.

Thank you

WarisBunglawala commented 2 years ago

i ran wang2vec nohup ./word2vec -train <MergedAll file location> -output <output file> -type 3 -size 200 -min-count 1 -cap 1 &

and got the txt file and again i tried to generate kv model using jrdf2vec but unfortunatly it only generate w2v formate and dosent generate kv

nohup java -jar jrdf2vec-1.2-SNAPSHOT.jar -convertToKv <wangTrained.txt file> <newfile.kv> &

then i also tried to make kv model from w2v file which is genreated during process but it thorws this error ....... 01 Mar 2022 23:38:25 ERROR [main] (Gensim.java:832) - An error occurred. Server returned: False 01 Mar 2022 23:38:25 INFO [Thread-0] (Gensim.java:713) - JVM shutdown detected - close python server if still open. 01 Mar 2022 23:38:25 DEBUG [Thread-0] (PoolingHttpClientConnectionManager.java:411) - Connection manager is shutting down 01 Mar 2022 23:38:25 DEBUG [Thread-0] (PoolingHttpClientConnectionManager.java:434) - Connection manager shut down 01 Mar 2022 23:38:25 INFO [Thread-0] (Gensim.java:715) - Shutdown completed.

WarisBunglawala commented 2 years ago

i think i have found the issue after training from wang2vec the first line of the file containes some number. i think it is number of vectors/entites and size of vectors, followed by some emmbeddings on second line and then my vectors and embeddings....

Screenshot (98)

so i deleted the very first line as i thought that it may be the reason why program throws incosistancy in dimentionality error and then i tried to use it in the jrdf2vec to convert into kv format and it worked perfectly.

Thanks and please check it out.

janothan commented 2 years ago

Hi @WarisBunglawala, thank you for providing details on this issue.

jRDF2vec uses the file endings .txt and .w2v to distinguish the two formats (the conversion would have worked with the correct file ending). However, given that wang2vec is not producing a vector file with ending .w2v out of the box, I agree that this is bad usability.

With commit cf155ea I fixed this issue. jRDF2vec now recognizes the .w2v format even if files end with .txt. I further fixed the error message so that Inconsistency in Dimensionality! will not be printed if the dimension of a w2v file is determined.

Feel free to test the updated version (can be downloaded here) and to reopen this issue if it still not works or another issue for further problems.

Thank you again.