dbpedia-spotlight / pignlproc

Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.
17 stars 14 forks source link

Fix Inconsistent Path of "sfs" Folder #7

Closed caizhiwei closed 11 years ago

caizhiwei commented 11 years ago

The path of sfs folder is defined by user in names_and_entities.pig.params. But "./sfs" is used as the path of sfs folder here,which may cause the following error:

java.lang.NullPointerException at pignlproc.helpers.RestrictedNGramGenerator.exec(RestrictedNGramGenerator.java:98) at pignlproc.helpers.RestrictedNGramGenerator.exec(RestrictedNGramGenerator.java:44) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:337) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:381)

maxjakob commented 11 years ago

L85 specifies after the # how you can refer to a file in the distributed cache inside exec (see also PIG-1752). Having sfs for both L85 and L95 should work. A good refactoring would be a variable for this.

Do you maybe get this error when the first part of the script (before the EXEC) fails and the the temporary folder is not created?

caizhiwei commented 11 years ago

System.out.println("folder:"+folder.getAbsolutePath()+" "+folder.listFiles()); output: folder:/home/wei/develop/dbpedia-gsoc/data/nl/pig/pignlproc/./sfs null

My setting in names_and_entities.pig.params: TEMPORARY_SF_LOCATION=/home/wei/develop/dbpedia-gsoc/data/nl/sf_lookup (I can find sfs files in sf_lookup)

I'm not familiar with pig and hadoop. Will file be cached when running in local mode? I think I should change the code like this. File folder = new File("./sfs"); if(!folder.exists()) { folder = new File(this.surfaceFormListFile); //local mode }

What do you think?

jodaiber commented 11 years ago

I am not sure if the file will be moved to distributed cache in local mode, probably not. I never ran it in local mode. But your fix makes sense. If you revert your first commit [1] so that there is only the last commit, we can merge this (just to ensure the commits make sense).

[1] - http://stackoverflow.com/questions/927358/how-to-undo-the-last-git-commit

caizhiwei commented 11 years ago

OK,I have reverted it.Thanks,Jo.

jodaiber commented 11 years ago

Okay, that's better. Can you make the commit message a bit more descriptive? It's not that the paths were inconsistent, but that it was only meant for dist. mode. So, something like "Enable extraction in pig local mode.".

caizhiwei commented 11 years ago

How about this?

jodaiber commented 11 years ago

Okay, congratulations on your first contribution!