Open ghtaro opened 1 week ago
Hi,
Yes, the huggingface format is not in json format. We briefly describe huggingface format here https://huggingface.co/datasets/zitongyang/entigraph-quality-corpus/blob/main/README.md?code=true#L55.
To convert the huggingface format:
<article_uid>.json
entity
field of the huggingface dataset, separated by the <|entityseptoekn|>
token.entigraph
field of the huggingface dataset, separated by the <\|entigraphseptoekn\|>
token.I hope this helps!
Thanks! I will try procedure you told me and get you back later.
Thank you very much for sharing all the codes on your brilliant work.
I would like to replicate the result by using the below dataset in hugginface.
I copied the above (parquet files) in data/dataset/raw/quality_entigraph_gpt-4-turbo/ and run the following commands.
I have got the following error messages:
It looks like we need the input files in json format. Is it correct? If so, could you tell me how to convert the parquet file to json format?