bayomim / BuildVSM

A helper system for OntoSeg
0 stars 0 forks source link

Struggling with buildVSM #1

Open pltrdy opened 6 years ago

pltrdy commented 6 years ago

Update: My node wasn't up to date, as Ubuntu's default "apt-get upgrade" does not handles nodejs/npm well.

I'm now facing another issue:

$nodejs ./convertToVSM.js 
/home/pltrdy/ontoseg/BuildVSM/node_modules/dbpedia-spotlight/lib/MDBSpotlight.js:99
                'response': res.body
                                ^

TypeError: Cannot read property 'body' of undefined
    at /home/pltrdy/ontoseg/BuildVSM/node_modules/dbpedia-spotlight/lib/MDBSpotlight.js:99:33
    at Request.callback (/home/pltrdy/ontoseg/BuildVSM/node_modules/superagent/lib/node/index.js:687:12)
    at ClientRequest.<anonymous> (/home/pltrdy/ontoseg/BuildVSM/node_modules/superagent/lib/node/index.js:639:10)
    at emitOne (events.js:116:13)
    at ClientRequest.emit (events.js:211:7)
    at Socket.socketErrorListener (_http_client.js:387:9)
    at emitOne (events.js:116:13)
    at Socket.emit (events.js:211:7)
    at emitErrorNT (internal/streams/destroy.js:64:8)
    at _combinedTickCallback (internal/process/next_tick.js:138:11)

I checked my data directory, I've two files, respectively 243 and 627 lines. Not sure what this is about.


I've been struggling with node (which I'm not very used to).

In the first place, I don't know about package-lock.json, I tried to run npm install and even copying the .json into package.json but it's not working properly:

$npm install
npm WARN enoent ENOENT: no such file or directory, open '/home/pltrdy/ontoseg/BuildVSM/node_modules/buildVSM/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/home/pltrdy/ontoseg/BuildVSM/node_modules/txtasvsm/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/home/pltrdy/ontoseg/BuildVSM/node_modules/dbp_annotate/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/home/pltrdy/ontoseg/BuildVSM/node_modules/utils/package.json'
npm WARN BuildVSM No repository field.
npm WARN BuildVSM No license field.

I tried to install dbpedia myself, but it won't do, I get:

$nodejs convertToVSM.js 
module.js:328
    throw err;
    ^

Error: Cannot find module 'dbpedia-spotlight'
    at Function.Module._resolveFilename (module.js:326:15)
    at Function.Module._load (module.js:277:25)
    at Module.require (module.js:354:17)
    at require (internal/module.js:12:17)
    at Object.<anonymous> (/home/pltrdy/ontoseg/BuildVSM/node_modules/dbp_annotate/index.js:7:19)
    at Module._compile (module.js:410:26)
    at Object.Module._extensions..js (module.js:417:10)
    at Module.load (module.js:344:32)
    at Function.Module._load (module.js:301:12)
    at Module.require (module.js:354:17)

Any clues? Sry, quite noobish on nodejs.

bayomim commented 6 years ago

This error looks like a problem with the spotlight library. It may have changed or so. I uploaded the full version that works fine with me BuildVSM-Full.zip

You won't need to do anything. Just run it as: "node convertToVSM.js" The dataset is even attached. If you would like to work on another data, just change the path to the folder where your data is in line 60: walk("data/haps/text, function..." --> walk("path/to/your/data, function...

pltrdy commented 6 years ago

I changed the path to my own data, and I get:

$nodejs convertToVSM.js 
/home/pltrdy/ontoseg/BuildVSM-Full/BuildVSM-Full/node_modules/dbpedia-spotlight/lib/MDBSpotlight.js:99
                'response': res.body
                                ^

TypeError: Cannot read property 'body' of undefined
    at /home/pltrdy/ontoseg/BuildVSM-Full/BuildVSM-Full/node_modules/dbpedia-spotlight/lib/MDBSpotlight.js:99:33
    at Request.callback (/home/pltrdy/ontoseg/BuildVSM-Full/BuildVSM-Full/node_modules/superagent/lib/node/index.js:687:12)
    at ClientRequest.<anonymous> (/home/pltrdy/ontoseg/BuildVSM-Full/BuildVSM-Full/node_modules/superagent/lib/node/index.js:639:10)
    at emitOne (events.js:116:13)
    at ClientRequest.emit (events.js:211:7)
    at Socket.socketErrorListener (_http_client.js:387:9)
    at emitOne (events.js:116:13)
    at Socket.emit (events.js:211:7)
    at emitErrorNT (internal/streams/destroy.js:64:8)
    at _combinedTickCallback (internal/process/next_tick.js:138:11)

I believe that it should be due to text formatting. I've seen you're using some kind of three line separation. The thing is, my inputs is "just" text without any kind of paragraphs etc. Therefore I put 1 sentence per line. Not sure how to make this work.

bayomim commented 6 years ago

Are you sure you downloaded the new system that I uploaded (BuildVSM-Full.zip)

I tried it now with a sample file that has one sentence per line and it worked perfectly. sample1.txt sample1_vsm.txt

pltrdy commented 6 years ago

Yes, I'm using BuildVSM-Full here (as see in the path), I'm having the same issue with the sample you just uploaded.

bayomim commented 6 years ago

Well, this is so strange.

If your data is applicable for sharing and is not sensitive you can send it to me and I can process it for you.

pltrdy commented 6 years ago

Should I run anything else? (excluding convertToVSM). I mean, something like a server? (the error about 'response' makes me think about a kinda timeout).

I'm not able to share this data

bayomim commented 6 years ago

No you don't need to run anything else.

This error is mainly from the dbpedia-spotlight module and it seems like a connection problem. at the begining I thought its a problem in the module itself, but I tried it in an online editor and it works fine. As it is a connection problem, make sure that there is no proxy or so in the network you are connecting from.

To try the module: click here and write the following: var mlspotlight = require("dbpedia-spotlight")

input="Barack Obama was the president of the United States."

mlspotlight.annotate(input,function(output){

console.log(output);

});

pltrdy commented 6 years ago

I tried, with success. In fact, the problem comes from the language. Switching back to english, everything's ok. But I'd like to work with french documents. Not sure if I need to setup something, I'm investigating.

Thanks for the support tho.

bayomim commented 6 years ago

Aha, that's why.

So I found a solution for this problem. In order to use the French language, open the file /BuildVSM-Full/node_modules/dbpedia-spotlight/lib/MDBSpotlight.js, and in line 42 at the french line, change: { host: 'spotlight.sztaki.hu', path: '/rest/annotate', port: '2225', confidence: 0.5, support: 0 }, to { host: 'model.dbpedia-spotlight.org', path: '/fr/annotate', port: '80', confidence: 0 , support: 0 },

I just tried it and it works fine

pltrdy commented 6 years ago

Thanks. It seems ok.

But then, in ontoseg,

  1. My files are <name>.txt and <name>_vsm.txt which does not match the expected format as described in line 117 var vsmFileName = textFileName.replace("text","vsm")+".vsm" right?
  2. I don't really get the process. Having fixed 1, it now iterates on VSM files, and thus try to open "_vsm_vsm.txt`. It seems to me that the walk function should exclude VSM files.