I am using spacy tokenizer within stanza pipeline. In some of the sentences, spacy tokenizer does not tokenize sentence ending point '.' as seperate token which in my case is needed.
Here is my code;
sentence='To 10-30mm2 section of stained material in a 2ml microfuge tube, add 600µl Lysis Buffer and 10µl Proteinase K.'
sentence=sentence.rstrip()
doc=nlp(unidecode(sentence)) # initialize stanza pipeline for every new sentence
token=[word.text for sent in doc.sentences for word in sent.words]
The result is;
["To","10","-","30mm2","section","of","stained","material","in","a","2ml","microfuge","tube",",","add","600ul","Lysis","Buffer","and","10ul","Proteinase","K."]
I want last two tokens as 'K' and '.' .
Can i do that?
I am using spacy tokenizer within stanza pipeline. In some of the sentences, spacy tokenizer does not tokenize sentence ending point '.' as seperate token which in my case is needed. Here is my code;
nlp= stanza.Pipeline('en', processors={'tokenize':'spacy'})
sentence='To 10-30mm2 section of stained material in a 2ml microfuge tube, add 600µl Lysis Buffer and 10µl Proteinase K.' sentence=sentence.rstrip() doc=nlp(unidecode(sentence)) # initialize stanza pipeline for every new sentence token=[word.text for sent in doc.sentences for word in sent.words]
The result is; ["To","10","-","30mm2","section","of","stained","material","in","a","2ml","microfuge","tube",",","add","600ul","Lysis","Buffer","and","10ul","Proteinase","K."] I want last two tokens as 'K' and '.' . Can i do that?