baidu / DuReader

Baseline Systems of DuReader Dataset
http://ai.baidu.com/broad/subordinate?dataset=dureader
1.13k stars 308 forks source link

About convert MARCO dataset to Dureader style #33

Closed AmosHua closed 5 years ago

AmosHua commented 6 years ago

when using the script marcov1_to_dureader.py to convert MARCOv1 to dureader, it failed because ValueError: Trailing data

LegendaryDan commented 5 years ago

when using the script marcov1_to_dureader.py to convert MARCOv1 to dureader, it failed because ValueError: Trailing data

@AmosHua Could you please provide more information? For example, what's the version of MSMARCO dataset do you use? More info can help us to reproduce the issue. Thanks.

AmosHua commented 5 years ago

when using the script marcov1_to_dureader.py to convert MARCOv1 to dureader, it failed because ValueError: Trailing data

@AmosHua Could you please provide more information? For example, what's the version of MSMARCO dataset do you use? More info can help us to reproduce the issue. Thanks.

I use MSMARCO v2

pengwei-iie commented 4 years ago

I have the same questions. The Traceback is as follows: Traceback (most recent call last): File "marcov1_to_dureader.py", line 33, in df = pd.read_json(sys.argv[1]) File "/home/user/anaconda3/lib/python3.6/site-packages/pandas/io/json/json.py", line 366, in read_json return json_reader.read() File "/home/user/anaconda3/lib/python3.6/site-packages/pandas/io/json/json.py", line 467, in read obj = self._get_object_parser(self.data) File "/home/user/anaconda3/lib/python3.6/site-packages/pandas/io/json/json.py", line 484, in _get_object_parser obj = FrameParser(json, **kwargs).parse() File "/home/user/anaconda3/lib/python3.6/site-packages/pandas/io/json/json.py", line 576, in parse self._parse_no_numpy() File "/home/user/anaconda3/lib/python3.6/site-packages/pandas/io/json/json.py", line 793, in _parse_no_numpy loads(json, precise_float=self.precise_float), dtype=None) ValueError: Trailing data