DS4SD / deepsearch-toolkit

Interact with the Deep Search platform for new knowledge explorations and discoveries
https://ds4sd.github.io/deepsearch-toolkit
MIT License
124 stars 18 forks source link

Unable to sequence the objects after parsing #166

Closed wilsoncharles closed 7 months ago

wilsoncharles commented 7 months ago

Im parsing a huge document, and Im trying to put the parsed json in the correct sequence as how it appears in the document. But im unable to do that. Ex., the tables are put at the end of the json without any reference which headers or index it belongs to. How do I link the tables with the currect position of it in the document

dolfim-ibm commented 7 months ago

The JSON of documents will look like the following:

{
  // ...
  "main-text": [
    {
        "type": "text",
        "text": "some example content",
        // ...
    },
    {
        "type": "table",
        "$ref": "#/tables/0"  // reference to the tables section
    },
  ],
  // ...
  "tables": [
    {
        // ...
    },
  ]
}

The correct sequence you are looking is the content of the main-text list. It includes tables and other floating elements in their correct position. The entry will be a jsonref (the $ref part) which points to the tables array. In the example above, #/tables/0 points to the first element of the tables.