Open eng-khaled1 opened 4 years ago
I'd suggest two things:
replacements
accepts List[Tuple[str, str]]
, so your case should be replacements=[('\n', ' ')]
.SpacyParser
is used. Please directly use spaCy to check how your text is split into sentences.thanks a lot for your help.
i tried both solutions but i'm still getting the same result. can i configurate the spacyParser further? could it be from the PDFs themself ? i would be very Gratefull if you could help me with this.
many thanks
I thought maybe if i could get the whole paragraph as a mention there is no need to split the sentences right. So i tried to use Paragrephmention class but i'm getting the Error: AttributeError: 'str' object has no attribute 'get_stable_id' what dose the Paragraphmention takes as input? thanks a lot
Description of the bug
I'm trying to Train a model that can build a Knowledge Base from the OPC UA Companions specification as a part of my Thesis. I have the Dataset as PDFs and used a third-party program to convert them into HTML and tried my best to preserve the data structure information (i'm getting the same result even if i just Parsed on the PDFs alone).
Then i followed the hardware_fonduer_model Tutorial to Extract the Candidates accordingly.
the Problem is that the Parser is splitting the sentences wrongly, namely it is getting the end of a Line as an end of a sentence. I tried to debug using a SimpleParser.split_sentences(text) command and turned out that python needs a backslash to split a statement into multiple lines.
So i thought maybe i could use the replacements=['[\n]', ' '] parameter so the Split could function better but i'm getting the ValueError: too many values to unpack (expected 2). What is the default configuration for the sentence segmentation?
How could i get a multiple Sentences as a mention? (i tried MentionNgram till n_max =100 and still getting just one).
I would really appreciate getting back from you.
many thanks in advance
Example: Text to be parsed
Boolean indicating if a profile /signature should be generated by this move command request.If the optional VariableSignatureRequestStatus is not provided on the Object, this parameter is ignored by the Server.
Expected behavior
sentence 1 : Boolean indicating if a profile /signature should be generated by this move command request. sentence 2 : If the optional VariableSignatureRequestStatus is not provided on the Object, this parameter is ignored by the Server.
Actual behavior
sentence 1 : Boolean indicating if a profile /signature should be generated by this move command sentence 2 : request. sentence 3 : request.If the optional VariableSignatureRequestStatus is not provided on the Object, this sentence 4 : parameter is ignored by the Server.
Environment