dbpedia / fact-extractor

Fact Extraction from Wikipedia Text
529 stars 79 forks source link

Translate sample CrowdFlower results into training data format #7

Closed marfox closed 9 years ago

marfox commented 9 years ago

Implement step 4.i of the workflow, as per the README. You should review the related script and make it more robust. In other words:

  1. parametrize hard-coded items
  2. add function docstrings
  3. add descriptive comments
  4. implement a solid command-line with the argparse module

Use the following samples as input:

  1. CrowdFlower results
  2. TreeTagger output directory
jerryking58 commented 9 years ago

Hi, I am trying this task and not clear about training data format due to the Italian sample. Can you explain the features you selected to be trained?

marfox commented 9 years ago

Hi, don't worry about the features, it is a syntactic transformation we are performing here. You just need to comply with the sample output syntax, this is language independent. Imagine that instead of tokens and lemmas in Italian, there could be any language (supported by TreeTagger) there.

On 3/18/15 7:24 AM, Jerry King wrote:

Hi, I am trying this task and not clear about training data format due to the Italian sample. Can you explain the features you selected to be trained?

— Reply to this email directly or view it on GitHub https://github.com/dbpedia/fact-extractor/issues/7#issuecomment-82774990.

fsonntag commented 9 years ago

Just a short question, is this issue still up to date? It looks like except the parameterization everything is already done.

marfox commented 9 years ago

It's still open for you to get acquainted with the code and the concepts behind