Closed ChakshuGautam closed 8 months ago
@ksgr5566. The pdf parser work will be tracked here
Ouput should look like this - link @ksgr5566
@GautamR-Samagra to have hardcoded rules to figure out the relative hierarchy of chunks.
Created csv : here
Required columns :
ID | Doc ID | contentString | summaryString | summaryEmbedding | text | textEmbedding | page | Tags | semanticVersion(section index) | sectionTitle | sectionString | sectionImages | isTable | isImage | isQuoted | parentString | imageBase64 | titleOfCurrentSection | siblingChunkUp | siblingChunkDown | meta | linkedChunks
The csv above for Samagra docs cover all except : isQuoted Sibling Chunk up Sibling Chunk down linked chunks
@prtkjakhar @aashutosh-samagra use the csv as it also contains the image base64 which doesnt fit on the google sheet
[ ] Store
[ ] Search for the following fields
[ ] Retrieval