Translate sample CrowdFlower results into training data format

marfox commented 9 years ago

Implement step 4.i of the workflow, as per the README. You should review the related script and make it more robust. In other words:

parametrize hard-coded items
add function docstrings
add descriptive comments
implement a solid command-line with the argparse module

Use the following samples as input:

jerryking58 commented 9 years ago

Hi, I am trying this task and not clear about training data format due to the Italian sample. Can you explain the features you selected to be trained?

marfox commented 9 years ago

Hi, don't worry about the features, it is a syntactic transformation we are performing here. You just need to comply with the sample output syntax, this is language independent. Imagine that instead of tokens and lemmas in Italian, there could be any language (supported by TreeTagger) there.

On 3/18/15 7:24 AM, Jerry King wrote:

Hi, I am trying this task and not clear about training data format due to the Italian sample. Can you explain the features you selected to be trained?

— Reply to this email directly or view it on GitHub https://github.com/dbpedia/fact-extractor/issues/7#issuecomment-82774990.

fsonntag commented 9 years ago

Just a short question, is this issue still up to date? It looks like except the parameterization everything is already done.

marfox commented 9 years ago

It's still open for you to get acquainted with the code and the concepts behind

dbpedia / fact-extractor

Translate sample CrowdFlower results into training data format #7