TREMA-UNH / trec-car-tools

Tools for working with the TREC CAR dataset.
http://trec-car.cs.unh.edu/
BSD 3-Clause "New" or "Revised" License
36 stars 14 forks source link

issue with reading TREC data by using trec-car-tools in python3 #16

Closed NLP-1217 closed 6 years ago

NLP-1217 commented 6 years ago

I have already followed the instruction to access the data by test.py, but there is something wrong I can not figure it out.

Here are the problem I met:

[xuesiyuan@241server python3]$ python ./trec_car/read_data.py ./benchmarkY1/benchmarkY1-train/train.pages.cbor ./benchmarkY1/benchmarkY1-train/train.pages.cbor-outlines.cbor ./benchmarkY1/benchmarkY1-train/train.pages.cbor-paragraphs.cbor >out

After this, there is nothing out.

[xuesiyuan@241server python3]$ python read_data_test.py ./benchmarkY1/benchmarkY1-train/train.pages.cbor ./benchmarkY1/benchmarkY1-train/train.pages.cbor-outlines.cbor ./benchmarkY1/benchmarkY1-train/train.pages.cbor-paragraphs.cbor >out Traceback (most recent call last): File "read_data_test.py", line 14, in for p in iter_annotations(f): File "/home/xuesiyuan/pythonworkspace/pythonWorkspace/py3/trec-car-tools-1.5/python3/trec_car/read_data.py", line 509, in _iter_with_header yield parse(cbor.load(file)) File "/home/xuesiyuan/pythonworkspace/pythonWorkspace/py3/trec-car-tools-1.5/python3/trec_car/read_data.py", line 71, in from_cbor return Page(pagename, pageId, map(PageSkeleton.from_cbor, cbor[3]), PageMetadata.from_cbor(cbor[4])) File "/home/xuesiyuan/pythonworkspace/pythonWorkspace/py3/trec-car-tools-1.5/python3/trec_car/read_data.py", line 215, in from_cbor pageType=PageType.from_cbor(cbor[1]) IndexError: list index out of range

Is there anything wrong ??

laura-dietz commented 6 years ago
Hi XueSiyuan,

  I will try to reproduce your issue.

  Did you use the latest version from the repository? Or did you use
  the the pipy/anaconda version?

  Best,
  Laura

  On 03/08/2018 12:10 AM, XueSiyuan1217 wrote:

  I have already followed the instruction to access the data by
    test.py, but there is something wrong I can not figure it out.
  Here are the problem I met:
  [xuesiyuan@241server python3]$ python ./trec_car/read_data.py
    ./benchmarkY1/benchmarkY1-train/train.pages.cbor
    ./benchmarkY1/benchmarkY1-train/train.pages.cbor-outlines.cbor
    ./benchmarkY1/benchmarkY1-train/train.pages.cbor-paragraphs.cbor
    >out
  After this, there is nothing out.
  [xuesiyuan@241server python3]$ python read_data_test.py
    ./benchmarkY1/benchmarkY1-train/train.pages.cbor
    ./benchmarkY1/benchmarkY1-train/train.pages.cbor-outlines.cbor
    ./benchmarkY1/benchmarkY1-train/train.pages.cbor-paragraphs.cbor
    >out
    Traceback (most recent call last):
    File "read_data_test.py", line 14, in 
    for p in iter_annotations(f):
    File

"/home/xuesiyuan/pythonworkspace/pythonWorkspace/py3/trec-car-tools-1.5/python3/trec_car/read_data.py", line 509, in _iter_with_header yield parse(cbor.load(file)) File "/home/xuesiyuan/pythonworkspace/pythonWorkspace/py3/trec-car-tools-1.5/python3/trec_car/read_data.py", line 71, in from_cbor return Page(pagename, pageId, map(PageSkeleton.from_cbor, cbor[3]), PageMetadata.from_cbor(cbor[4])) File "/home/xuesiyuan/pythonworkspace/pythonWorkspace/py3/trec-car-tools-1.5/python3/trec_car/read_data.py", line 215, in from_cbor pageType=PageType.from_cbor(cbor[1]) IndexError: list index out of range Is there anything wrong ?? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

  {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/TREMA-UNH/trec-car-tools","title":"TREMA-UNH/trec-car-tools","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/TREMA-UNH/trec-car-tools"}},"updates":{"snippets":[{"icon":"DESCRIPTION","message":"issue with reading TREC data by using trec-car-tools in python3 (#16)"}],"action":{"name":"View Issue","url":"https://github.com/TREMA-UNH/trec-car-tools/issues/16"}}}
laura-dietz commented 6 years ago

The code is fixed, but it not yet on anaconda cloud.

Here is how to install: 1) activate your python environment 2) clone this repository 3) cd python3 4) call python setup.py install 5) use it! (You can test it by calling python read_data_test.py)