dainlp / acl2020-transition-discontinuous-ner

65 stars 9 forks source link

Some Questions about Preprocessing #8

Closed pydxflwb closed 3 years ago

pydxflwb commented 3 years ago

Hello!

  1. https://github.com/daixiangau/acl2020-transition-discontinuous-ner/blob/77b4c4c9e9e3c2d3f601a0e56c5d75fb73682aad/data/cadec/extract_annotations.py#L17

What is the '/ann' ? (a directory? I think it an ann file according to your code in https://github.com/daixiangau/acl2020-transition-discontinuous-ner/blob/77b4c4c9e9e3c2d3f601a0e56c5d75fb73682aad/data/cadec/extract_annotations.py#L36 ) Similar Errors in 'data/cadec' codes, it seems that most of them are caused by wrong filenames?

  1. Hope you can give some instructions about the directories since I don't even know which directory some data should be AFTER READING YOUR CODE.

Lack of docs makes me quite confused.

However, thanks for sharing code and your excellent idea of the work.

dainlp commented 3 years ago
  1. /data/dai031/Experiments/CADEC/all/ann is the full path of a file, which is used to save the output

  2. some rule of thumb: _dir is a directory, and _filepath is a file Taking the CADEC dataset as an example, once you download the data, you can find a folder called 'cadec', consisting of child folders such as 'text' ..., so most of these inputdir refers to a specific folder output* arguments are folders or filepaths which you create to store intermediate data

Hope it helps

pydxflwb commented 3 years ago

Thanks for replying so soon!.

pydxflwb commented 3 years ago

Things seem to be good after creating an ann file and modifying some directories. Thanks for your help!