Open DarrinGlad opened 3 years ago
Sorry, we don't have specific tools to convert data in test.jsonl format (since it very much depend on what the initial source of data is !). Note if you want to make a prediction on new document, the data only needs 4 fields - { "doc_id" : str = Document Id as used by Semantic Scholar, "words" : List[str] = List of words in the document, "sentences" : List[Span] = Spans indexing into words array that indicate sentences, "sections" : List[Span] = Spans indexing into words array that indicate sections, }
The remaining fields are needed when you want to use your own data to train the model.
Sorry, we don't have specific tools to convert data in test.jsonl format (since it very much depend on what the initial source of data is !). Note if you want to make a prediction on new document, the data only needs 4 fields - { "doc_id" : str = Document Id as used by Semantic Scholar, "words" : List[str] = List of words in the document, "sentences" : List[Span] = Spans indexing into words array that indicate sentences, "sections" : List[Span] = Spans indexing into words array that indicate sections, }
The remaining fields are needed when you want to use your own data to train the model.
But when I format my data in the way as you stated, I meet the problem below:
Traceback (most recent call last):
File "scirex/predictors/predict_ner.py", line 123, in
It seems that scirex_full_reader.py will reads all fields of the json file, So how could I fixed it?
Hello, I was wondering if there were any tools already implemented a way to format a paper into the test.jsonl format