Closed NLP-1217 closed 6 years ago
Hi XueSiyuan,
I will try to reproduce your issue.
Did you use the latest version from the repository? Or did you use
the the pipy/anaconda version?
Best,
Laura
On 03/08/2018 12:10 AM, XueSiyuan1217 wrote:
I have already followed the instruction to access the data by
test.py, but there is something wrong I can not figure it out.
Here are the problem I met:
[xuesiyuan@241server python3]$ python ./trec_car/read_data.py
./benchmarkY1/benchmarkY1-train/train.pages.cbor
./benchmarkY1/benchmarkY1-train/train.pages.cbor-outlines.cbor
./benchmarkY1/benchmarkY1-train/train.pages.cbor-paragraphs.cbor
>out
After this, there is nothing out.
[xuesiyuan@241server python3]$ python read_data_test.py
./benchmarkY1/benchmarkY1-train/train.pages.cbor
./benchmarkY1/benchmarkY1-train/train.pages.cbor-outlines.cbor
./benchmarkY1/benchmarkY1-train/train.pages.cbor-paragraphs.cbor
>out
Traceback (most recent call last):
File "read_data_test.py", line 14, in
for p in iter_annotations(f):
File
"/home/xuesiyuan/pythonworkspace/pythonWorkspace/py3/trec-car-tools-1.5/python3/trec_car/read_data.py", line 509, in _iter_with_header yield parse(cbor.load(file)) File "/home/xuesiyuan/pythonworkspace/pythonWorkspace/py3/trec-car-tools-1.5/python3/trec_car/read_data.py", line 71, in from_cbor return Page(pagename, pageId, map(PageSkeleton.from_cbor, cbor[3]), PageMetadata.from_cbor(cbor[4])) File "/home/xuesiyuan/pythonworkspace/pythonWorkspace/py3/trec-car-tools-1.5/python3/trec_car/read_data.py", line 215, in from_cbor pageType=PageType.from_cbor(cbor[1]) IndexError: list index out of range Is there anything wrong ?? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/TREMA-UNH/trec-car-tools","title":"TREMA-UNH/trec-car-tools","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/TREMA-UNH/trec-car-tools"}},"updates":{"snippets":[{"icon":"DESCRIPTION","message":"issue with reading TREC data by using trec-car-tools in python3 (#16)"}],"action":{"name":"View Issue","url":"https://github.com/TREMA-UNH/trec-car-tools/issues/16"}}}
The code is fixed, but it not yet on anaconda cloud.
Here is how to install:
1) activate your python environment
2) clone this repository
3) cd python3
4) call python setup.py install
5) use it! (You can test it by calling python read_data_test.py)
I have already followed the instruction to access the data by test.py, but there is something wrong I can not figure it out.
Here are the problem I met:
[xuesiyuan@241server python3]$ python ./trec_car/read_data.py ./benchmarkY1/benchmarkY1-train/train.pages.cbor ./benchmarkY1/benchmarkY1-train/train.pages.cbor-outlines.cbor ./benchmarkY1/benchmarkY1-train/train.pages.cbor-paragraphs.cbor >out
After this, there is nothing out.
[xuesiyuan@241server python3]$ python read_data_test.py ./benchmarkY1/benchmarkY1-train/train.pages.cbor ./benchmarkY1/benchmarkY1-train/train.pages.cbor-outlines.cbor ./benchmarkY1/benchmarkY1-train/train.pages.cbor-paragraphs.cbor >out Traceback (most recent call last): File "read_data_test.py", line 14, in
for p in iter_annotations(f):
File "/home/xuesiyuan/pythonworkspace/pythonWorkspace/py3/trec-car-tools-1.5/python3/trec_car/read_data.py", line 509, in _iter_with_header
yield parse(cbor.load(file))
File "/home/xuesiyuan/pythonworkspace/pythonWorkspace/py3/trec-car-tools-1.5/python3/trec_car/read_data.py", line 71, in from_cbor
return Page(pagename, pageId, map(PageSkeleton.from_cbor, cbor[3]), PageMetadata.from_cbor(cbor[4]))
File "/home/xuesiyuan/pythonworkspace/pythonWorkspace/py3/trec-car-tools-1.5/python3/trec_car/read_data.py", line 215, in from_cbor
pageType=PageType.from_cbor(cbor[1])
IndexError: list index out of range
Is there anything wrong ??