Access Wikipedia article names based on the concept vector for a given term

nicolaierbs commented 9 years ago

Original issue 23 created by dkpro on 2014-01-08T16:21:19.000Z:

I would like to get the highest scoring Wikipedia article names based on the concept vector for a given term.

I use VectorIndexReader to access the index of Wikipedia and with VectorIndexReader.getVector(aTerm) I get the corresponding concept vector. Now I would like to see which Wikipedia articles have a high association with the given term, but I don't know whether this is possible.

With LuceneVectorReader it would be possible to access the content of the files and the filenames. Unfortunately you only provide the inverted index version of the Wikipedia index.

I use verion 2.1.0 of Dkpro Similarity.

Could you please help me? Many thanks in advance!

nicolaierbs commented 9 years ago

Comment #1 originally posted by dkpro on 2014-01-08T16:43:50.000Z:

It is not possible to reconstruct that information from the Vector-Indexes.

I could provide a very old LuceneIndex for English or you could create your own using JWPL and the dkpro.similarity.uima.vsm-asl module.

-Torsten

nicolaierbs commented 9 years ago

Comment #2 originally posted by dkpro on 2014-01-08T20:59:16.000Z:

If it makes short work of providing me the old LuceneIndex I would like to get it to do some experiments. Apart from that I will create my own index.

Thank you!

-Annika

nicolaierbs commented 9 years ago

Comment #3 originally posted by dkpro on 2014-01-09T09:09:34.000Z:

<empty>

dkpro / dkpro-similarity

Access Wikipedia article names based on the concept vector for a given term #23