Closed DHoeschele closed 6 years ago
iFilters are for indexing, there is no options to get the exact same text as how it is in a Word document. A Word document works with formatting like paragraphs, line breaks, etc... and an iFilter doesn't
I checked the iFilter but it is giving me the chunck exactly how you are describing them.
Hi Kees, Thank you for the reply. I understand your point. I don’t really care about line breaks, just the words. But I don’t think Indexing is very useful when it returns
" Search versus ElasticSearchExtractin" & vbLf & "g words
When the input was
FullText Search versus ElasticSearch Extracting words
Thanks, Dave
From: Kees notifications@github.com Sent: Tuesday, October 2, 2018 11:13 AM To: Sicos1977/IFilterTextReader IFilterTextReader@noreply.github.com Cc: Dave Hoeschele dhoeschele@accesscorp.com; Author author@noreply.github.com Subject: Re: [Sicos1977/IFilterTextReader] TextReader not recognixing line breaks in .docx File (#30)
iFilters are for indexing, there is no options to get the exact same text as how it is in a Word document. A Word document works with formatting like paragraphs, line breaks, etc... and an iFilter doesn't
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Sicos1977_IFilterTextReader_issues_30-23issuecomment-2D426312209&d=DwMCaQ&c=ZIDVjFRhCN0DRT5UkiESs3wFvGshyeGNRFoIZxPLWOQ&r=xEbzpAlSeJDHJgw_Cmk00D9cZvNjsWlVWm-HwRmzMPM&m=Fo-WJ7aLgPcx0SW57PoDU_-Ci0QNxwkmdLDQvCQvB-g&s=yrcWG34bVsoArqwfj3D4ebofRXBdMkYRkKe0DJH46tc&e=, or mute the thread [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ApwbbRrj6Pd0BriUkGIabLsouaO2e0-5Fsks5ug4KGgaJpZM4XEIzv&d=DwMCaQ&c=ZIDVjFRhCN0DRT5UkiESs3wFvGshyeGNRFoIZxPLWOQ&r=xEbzpAlSeJDHJgw_Cmk00D9cZvNjsWlVWm-HwRmzMPM&m=Fo-WJ7aLgPcx0SW57PoDU_-Ci0QNxwkmdLDQvCQvB-g&s=8Ohd3BM_1KJAgIGEKhLiD_W7gE-Qx8PxY43u17LUjGs&e=.
It's the iFilter that is returning it that way. iFilterTextReader just returns what a Windows iFilter is returning. I do some cleanup in the code but nothing that gives you " Search versus ElasticSearchExtractin" & vbLf & "g words
OK, Thanks. I’ll ask MicroSoft about it. Though I’m not expecting much of an answer from them. Thanks for your time. You have developed a nice product. Dave
From: Kees notifications@github.com Sent: Tuesday, October 2, 2018 11:41 AM To: Sicos1977/IFilterTextReader IFilterTextReader@noreply.github.com Cc: Dave Hoeschele dhoeschele@accesscorp.com; Author author@noreply.github.com Subject: Re: [Sicos1977/IFilterTextReader] TextReader not recognixing line breaks in .docx File (#30)
It's the iFilter that is returning it that way. iFilterTextReader just returns what a Windows iFilter is returning. I do some cleanup in the code but nothing that gives you " Search versus ElasticSearchExtractin" & vbLf & "g words
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Sicos1977_IFilterTextReader_issues_30-23issuecomment-2D426322810&d=DwMFaQ&c=ZIDVjFRhCN0DRT5UkiESs3wFvGshyeGNRFoIZxPLWOQ&r=xEbzpAlSeJDHJgw_Cmk00D9cZvNjsWlVWm-HwRmzMPM&m=GqNw1t_tMUgzpsS2wlSYDgk3tYeixU84yMOiq6dnXKk&s=39ppmEdvTRzCX0lVIULNbASl4-sUtyBU9lKUUNcsWgE&e=, or mute the thread [github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ApwbbYH0xW37NFS6XO-5FpF1k8jFR1piXYks5ug4kKgaJpZM4XEIzv&d=DwMFaQ&c=ZIDVjFRhCN0DRT5UkiESs3wFvGshyeGNRFoIZxPLWOQ&r=xEbzpAlSeJDHJgw_Cmk00D9cZvNjsWlVWm-HwRmzMPM&m=GqNw1t_tMUgzpsS2wlSYDgk3tYeixU84yMOiq6dnXKk&s=6kRm8Jlip_tN3tijudz7FpVVWAaj8vWB0l60VmfydEg&e=.
Hi, I'm not sure if this is a problem with IFilterTextReader or the Windows IFilter. I have a docx file with these lines:
FullText Search versus ElasticSearch Extracting words from MS files and PDFs Use IFilters to extract text for ElasticSearch This is the end
The docx file is attached. Test IFilter.docx
This is returned from FilterReader ReadToEnd()
"FullText" & vbLf & " Search versus ElasticSearchExtractin" & vbLf & "g words from MS files and PDFsUse IFilters to extract text for ElasticSearch This is the end" & vbLf
It seems the vblf's are in the wrong place and ElasticSearchExtracting should be broken into two words.
I'm running Windows 10 and VisualStudio 2017.
Thanks for your help Dave