Closed camelot2002 closed 3 years ago
The document ids are either unique IDs provided by the data vendor or they can be incremental IDs. If you have a CSV file with no other unique identifiers, you can save the row numbers as the document IDs.
i dont have a csv file all i have is the data
i have a ticker to differentiate different companies. But in your csv files one document has multiple document ids and i dont understand how a document has been broken down.
One input document corresponds to one unique id. The number of rows in document file is the same as the document-id file.
the document.txt in the input folder contains several documents right? and each line has a unique id okay. And also each document has a unique id. How does it differentiate between different documents in that plethora of text.
Each line in document.txt is a unique document with line breaks removed.
okay thank you.
I wanted to change the data set but am unable to understand how you have mapped document_ids to the documents. A little clarification of that in readme.md would be really helpful. Thank you.