Markismus / PocketBookDic

Script to convert dictionaries to pocketbook dictionary dic-format and Koreader optimized dictionaries.
GNU General Public License v3.0
51 stars 5 forks source link

Problems with conversion of ABBYY html, especially with Larousse #4

Closed Markismus closed 1 year ago

Markismus commented 2 years ago

pzack wrote:

Good evening, M Markismus,

I found your pocketbook file. Please dis-regard my last 2 or 3 messages.

A wonderful job as far as I can tell!

There is one very important item that I would wish that you give it your attention. There are a collection of linguistic articles concerning grammar and linguitics:

"Grammar et linguistique". These are dispersed throughout the dictionary and are usually found at the end of the headword definitions. I have found some of these articles, however, one article that I found under the word "ordre",grammaire et linguistics, L'ordre des Mots, has been clipped. The article, for some reason, is not being seen in full. I have found several other of these grammar and linguistic articles and they seem to be complete.

Looking at the pdf file you will see that these articles were provided with a separate searchable index but, the articles were intergrated under the head words.

It would be lovely if those articles were complete and I don't understand why there is truncation of the one that I found.

This would be a "fine tuning" but a very important one as this was a distinguishing feature of the dictionary.

What I have seen so far,nevertheless, looks very good. There may, for some reason, still be headwords not being seen; I couldn't find the word "unique".

You have done,despite some items that I am certain can be corrected, a truly superb job. Thank you.

pz

A message he followed up with:

pocketbookDic-2nd message Good evening M Markismus,

Following up the previous message on the possible truncated linguistic articles.

I put the dictionary in goldendict and the same grammaire et linguistique article under "ordre" was not truncated. It is complete.

I do not know why it is complete in goldendict and not under koreader.

Can you give this a look?

very cordially, ps

Markismus commented 2 years ago

It's cut off at the table Tableau I. The solution is to add (a) line(s) to the lua-file that removes tables from the entry or replaces them with a simple representation of the contents.

Markismus commented 2 years ago

Nice Find!

The missing `unique' is a big problem: image

<p><span class="font4" style="font-weight:bold;">unipolaire </span><span class="font29">[ynipoler] adj. (de wm-2etde </span><span class="font29" style="font-style:italic;">polaire ;</span><span class="font29"> 1845, Bescherelle, au sens 1 ; sens 2, 1877, Littré). </span><span class="font4" style="font-weight:bold;">1. </span><span class="font29">Qui n’a qu’un pôle électrique : </span><span class="font29" style="font-style:italic;">Appareil, interrupteur unipolaire. </span><span class="font4" style="font-weight:bold;">|| 2. </span><span class="font29">Se dit d’un neurone dont le corps cellulaire porte un seul prolongement, comme les neurones en T des ganglions spinaux, </span><span class="font4" style="font-weight:bold;">unique </span><span class="font29">[ynik] adj. (lat. </span><span class="font29" style="font-style:italic;">unicus,</span><span class="font29"> unique, seul, sans égal, de </span><span class="font29" style="font-style:italic;">unus,</span><span class="font29"> un [seul] ; fin du xv<sup>e</sup> s., Molinet, au sens 1 </span><span class="font29" style="font-style:italic;">[seul et unique, </span><span class="font29">1751, </span><span class="font29" style="font-style:italic;">Encyclopédie —</span><span class="font29"> discours préliminaire; </span><span class="font29" style="font-style:italic;">...fils... unique,</span><span class="font29"> 1668, Molière] ; sens2, 1876, Larousse [art. </span><span class="font29" style="font-style:italic;">voie —</span><span class="font29"> sur une route, xx<sup>e</sup> s. ; </span><span class="font29" style="font-style:italic;">sens unique,</span><span class="font29"> janv. 1914, </span><span class="font29" style="font-style:italic;">la Science et la Vie,</span><span class="font29"> p. 31] ; sens 3,1640, Corneille ; sens 4, av. 1696, La Bruyère ; sens 5,1758, Diderot).</span></p>

It illustrates that sometimes ABBY can't be trusted to put text-blocks into one p-block. And that means that to save it as formatted-tex without headers and footers, with no pictures and with no hyphen and no linebreaks, doesn't work. I'll have to resave it with hyphens and linebreaks, analyse the changes to html and see how much work it is to adapt the code.

bousnah commented 2 years ago

Hello M Markismus,

I found also that the word "avoir", is not found in addition to "unique". It seems that the indexing may not be seeing all headwords. Also, goldendict saw the complete linguistic article under the word "ordre" but under koreader the article was truncated beginning with Table.

bousnah commented 2 years ago

Concerning "avoir" again and referring to the pdf file you will see that the first instance of avoir as headword is preceded by a "1." thus "1. avoir". This is going to occur with certain very common words where the editors have divided up the definition into distinct sections. There follows a "2. avoir" etc. The pdf file is instructive on this issue as to how you decide to list the definition either containing or separating the numbered headwords that are the same. I have noticed that certain headwords may a number and then a period and then the word. Of course, in the dictionary lookup you would need to know to prefix the headword with the number and period. Looking up "avoir" requires "1. avoir".

The word "abandonner" is not found and I notice that some a-words, apart from "avoir" are not seen. I am doing some random lookups.

bousnah commented 2 years ago

I am thinking, regarding the instance of a headword preceded by a number and period such as "1. avoir" , and I don't know code, but somehow, when the code points to a headword it would be a string containing the headword. So that, on searching "avoir" for example, the pointer would search the string containing "avoir" within the delimited boundaries that you have set.

bousnah commented 2 years ago

random searches; lanius, lanlaire not found but lanoline found. It seems odd that certain words are not found except for the issue of "unique" and "1. avoir".

bousnah commented 2 years ago

Please have a look at the word "temps" which includes a "grammaire et linguistic" article. The formatting breaks down after a number of lines until everything disappears off the page. But, in goldendict the formatting of the definition of "temps" is correct.

bousnah commented 2 years ago

Doing some random searches, I am still finding that many headwords are not found. For some reason the abby html conversion is not consistant; aside from the issue of headwords preceded by a numeral and period such as "1. avoir".

bousnah commented 2 years ago

I have found some more headwords that start with "1." and then the headword. I am thinking that perhaps the code that is used for the headword search would always search first for "1." and then look for a match on the word that follows "1." So that all searches would search the "1." first; if there is not a match on the word that follows then it proceeds to search just the headword itself without the "1." prefix. Sorry that I can' t provide code to do this.
headword=x search "1. x" if found display otherwise search "x" Something like that I suppose.

bousnah commented 2 years ago

I think that I am in error concerning the dictionary search protocol in koreader so that my previous commented suggestions regarding the search on headwords beginning with "1." are probably not applicable. I that the stardict search cannot be individualised for each dictionary. Therefore, a possible work-around for this dictionary is to modify the text file. Could some code be created to go through the file that would eliminate this prefix "1." that is seen for a number of headwords? Thus, in eliminating this prefix you leave only the headword that can be properly searched. There is also the issue of headwords that are not found; sometimes the headword just before or just after the not found headword are found. I hope that there is still activity with the abbey html conversion with larousse because I have not seen comments in the last several days.

Markismus commented 2 years ago

If you look at the repository you can see a nice graph of the activity. If you look at the code, you can follow the changes in the commits.

If I get results that can use further testing, I will post here. For now, you can see whether replacing the content of you lua-file with this content alleviates the table problems:

return function(html)
html = html:gsub('<img[^>]+>', '')
-- Because lua matches the first find and repeats that, 
-- of table start-tags with different attributes only one will be substituted
while html:find("(<table[^>]+>.-)</?p[^>]*>(.-</table>)") do 
    html  = html:gsub("(<table[^>]+>.-)</?p[^>]*>(.-</table>)", "%1%2")
end
-- while html:find("<table[^>]+>") do
    html = html:gsub("<table[^>]+>", '<p></p><div style="display:table;>')
-- end
-- while html:find("<tr[^>]+>") do
    html = html:gsub("<tr[^>]+>", '<p><div style="display:table-row;">')
-- end
-- Koreader doesn't display border-left, -right. Also not if extra styling, s.a. solid, black is given.
-- html = html:gsub("<td[^>]+>", '<div style="display:table-cell;border-left: 3px;border-right: 3px;">') 
while html:find("<td[^>]+>") do
html = html:gsub("<td[^>]+>", '<div style="display:table-cell;">|') -- |    
end
-- html = html:gsub("<tr[^>]+>", '<div style="width=100%;">')
-- html = html:gsub("<td[^>]+>", '<div style="display:inline-block;float:left;>')
html = html:gsub("</td>", '</div>')
html = html:gsub("</tr>", '</div></p><hr style="height:3px;color:black;" />')
html = html:gsub("</table>", '</div><p></p>')
return html
end
bousnah commented 2 years ago

Thank you for your email. Since I do not understand the code I can't follow it unfortunately. I wish that I could contribute to the code and to the project. You are working on an important issue. However, as I have commented earlier, an issue of equal or greater importance is making sure that all headwords can be found in the larousse dictionary look-up in koreader. The related issue of headwords prefixed with numbers and a period-the editors having decided to do this rather than placing the numerals within the definitions,confuses the stardict/koreader search engine I think. So, I have suggested that one might try to modify the text by suppressing any prefixed numbers before headwords. It is an idea and you probably have a better idea as to how find headwords prefixed with numerals. "1. avoir" and "1. porter" are examples.

The other issue is that for some reason some headwords are simply not found, aside from the problem of the numeral + headword.

On Mon, Oct 17, 2022 at 4:40 PM Markismus @.***> wrote:

If you look at the repository you can see a nice graph of the activity. If you look at the code, you can follow the changes in the commits.

If I get results that can use further testing, I will post here. For now, you can see whether replacing the content of you lua-file with this content alleviates the table problems:

return function(html)

html = html:gsub('<img[^>]+>', '') -- Because lua matches the first find and repeats that, -- of table start-tags with different attributes only one will be substituted while html:find("(<table[^>]+>.-)</?p[^>]*>(.-)") do

html  = html:gsub("(<table[^>]+>.-)</?p[^>]*>(.-</table>)", "%1%2")

end -- while html:find("<table[^>]+>") do

html = html:gsub("<table[^>]+>", '<p></p><div style="display:table;>')

-- end -- while html:find("<tr[^>]+>") do

html = html:gsub("<tr[^>]+>", '<p><div style="display:table-row;">')

-- end -- Koreader doesn't display border-left, -right. Also not if extra styling, s.a. solid, black is given. -- html = html:gsub("<td[^>]+>", '

') while html:find("<td[^>]+>") do

html = html:gsub("<td[^>]+>", '

|') -- | end -- html = html:gsub("<tr[^>]+>", '
') -- html = html:gsub("<td[^>]+>", '<div style="display:inline-block;float:left;>')

html = html:gsub("", '

')

html = html:gsub("", '


')

html = html:gsub("", '

') return html end

— Reply to this email directly, view it on GitHub https://github.com/Markismus/PocketBookDic/issues/4#issuecomment-1281384420, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3APZVPBKA2HKNXU4JV74NTWDWTUTANCNFSM6AAAAAARDG22LA . You are receiving this because you commented.Message ID: @.***>

Markismus commented 2 years ago
bousnah commented 2 years ago

Thank you for your email. It looks like you may have corrected the problem of missing headwords since "unique" was one of the headwords not found. There remains the important issue of headwords that have the initial numeral and period as mentioned before. This would leave out a fair amount of important words if the stardict/koreader search cannot find these words.

Have you updated the file so that one can re-install the dictionary? If you have, please indicate where the file is found.

cordially

On Tue, Oct 18, 2022 at 4:46 PM Markismus @.***> wrote:

[image: image] https://user-images.githubusercontent.com/5269101/196528933-7ee18e4a-76de-4563-b144-a602220570ee.png

— Reply to this email directly, view it on GitHub https://github.com/Markismus/PocketBookDic/issues/4#issuecomment-1282919781, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3APZVLTK6C4LR3F5BFDNLTWD347PANCNFSM6AAAAAARDG22LA . You are receiving this because you commented.Message ID: @.***>

Markismus commented 2 years ago

lanlaire and avoir are still not found, but the other mentioned are. So I'll take a look at them and recreate the dictionary.

lanlaire was not found due to OCR:

<p><span class="font4" style="font-weight:bold;">IanIaire </span><span class="font29">[lâler] (mot de fantaisie, utilisé<br>comme refrain de chanson populaire ; 1832,<br>Balzac </span><span class="font29" style="font-style:italic;">[va te faire lanlaire !
Markismus commented 2 years ago

Word preceded with a number have a double span-bold, so I'll have to expand the criterium for a keyword:

<p>
<span class="font4" style="font-weight:bold;">1.</span>
<span class="font4" style="font-weight:bold;"> avoir </span>
<span class="font29">[avwar] v. tr. (lat. </span>
<span class="font29" style="font-style:italic;">habere ;</span>
bousnah commented 2 years ago

Thank you for the message and kind update on the headword search modifications. Just doing some random searches from my reading there are headwords that are not found but are found in other dictionary files under koreader. I hope that you can resolve the problem of headwords preceded by a number and period,aside from the general problem of missing headwords, as those headwords are usually key words that command quite lengthy definitions as these words are used in many different contexts.

Looking forward to the recreated dictionary!

cordially

On Wed, Oct 19, 2022 at 5:33 PM Markismus @.***> wrote:

Word preceded with a number have a double span-bold, so I'll have to expand the criterium for a keyword:

1. avoir [avwar] v. tr. (lat. habere ; — Reply to this email directly, view it on GitHub , or unsubscribe . You are receiving this because you commented.Message ID: ***@***.***>

Markismus commented 2 years ago

And the current version is on pCloud.

bousnah commented 2 years ago

Thank you for your message. I have only just given a cursory look and it looks pretty good! It looks like you solved the numerals prefix problem.

Have you converted the tableaux/tables to a different format or have you eliminated the tables keeping only the indication that a tableau/table exists; I have not had a chance to look at this. cordially

On Thu, Oct 20, 2022 at 1:06 PM Markismus @.***> wrote:

And the current version is on pCloud.

— Reply to this email directly, view it on GitHub https://github.com/Markismus/PocketBookDic/issues/4#issuecomment-1285810687, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3APZVO3OLHSZYCOID7ISADWEFUW3ANCNFSM6AAAAAARDG22LA . You are receiving this because you commented.Message ID: @.***>

bousnah commented 2 years ago

I wanted to also mention something that I have wanted to present but I thought better to wait until the current project was near completion and that is concerning a dictionary scroll feature. I have a an old but mint condition sony prs t 2 ereader with a pretty good dictionary. When I search a word or when, in reading, I highlight a word for look-up I not only find the word but am able to look at words and definitions following and preceding the searched word. For those, myself included, who like to read or at least casually peruse a dictionary this is a nice feature and a way to possibly discover some interesting words.

This function already exists in koreader but it is limited to scrolling the definition of the word searched and found. If it is possible, could this function be expanded to scroll through a dictionary forward or backward from the word searched? This function would apply to any dictionary that the user has under koreader.

I would hope that this would not require a re-format of dictionnaries but rather a modification to the search engine under stardict or koreader and I am not sure where exactly this function is located and executed.

cordially

On Thu, Oct 20, 2022 at 1:06 PM Markismus @.***> wrote:

And the current version is on pCloud.

— Reply to this email directly, view it on GitHub https://github.com/Markismus/PocketBookDic/issues/4#issuecomment-1285810687, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3APZVO3OLHSZYCOID7ISADWEFUW3ANCNFSM6AAAAAARDG22LA . You are receiving this because you commented.Message ID: @.***>

Markismus commented 2 years ago

Neither do I, you will have to do a lot of work. Stardict isn't suited for it, so you would have to compile a wordlist that generates the next nearest neighbours and than add two buttons in the dictionary widget to display them. And it all has to be done in Koreader. It would be better to read the already present PDF-file.

Markismus commented 2 years ago

The table is displayed awfully. I actually contrived to get a horizontal display of the gutted table, but it again went back to vertical display. No idea what causes it this time. I'm leaving alone.

bousnah commented 2 years ago

Thank you for your response. I didn't explain myself, I think, very clearly; this scrolling function would have applied to any dictionary and not just for the one recently worked on. However, it's a moot point as it seems that it would be a complicated affaire to make this function available. Too bad, but, I am thankful for the work that you have done(I think this project started in early september!) to produce a dictionary that functions quite well under koreader.

cordially

On Thu, Oct 20, 2022 at 4:26 PM Markismus @.***> wrote:

Neither do I, you will have to do a lot of work. Stardict isn't suited for it, so you would have to compile a wordlist that generates the next nearest neighbours and than add two buttons in the dictionary widget to display them. And it all has to be done in Koreader. It would be better to read the already present PDF-file.

— Reply to this email directly, view it on GitHub https://github.com/Markismus/PocketBookDic/issues/4#issuecomment-1286033792, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3APZVNQN7XG5KWLZOC54OTWEGMHLANCNFSM6AAAAAARDG22LA . You are receiving this because you commented.Message ID: @.***>

bousnah commented 2 years ago

I can't say that I blame you; I mentioned in some previous comments that the tableaux/tables would be nice to have but the texts in the articles, and not the tableaux/tables, are the most important to retain

May I ask you if you are reasonably confident that all text-aside from tables-is included in the linguistic articles? In addition, are you confident,now, with this recent recreation, that all headwords would be found?

cordially

On Thu, Oct 20, 2022 at 4:28 PM Markismus @.***> wrote:

The table is displayed awfully. I actually contrived to get a horizontal display of the gutted table, but it again went back to vertical display. No idea what causes it this time. I'm leaving alone.

— Reply to this email directly, view it on GitHub https://github.com/Markismus/PocketBookDic/issues/4#issuecomment-1286035285, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3APZVIDYMWITJSVFOOVPB3WEGMMVANCNFSM6AAAAAARDG22LA . You are receiving this because you commented.Message ID: @.***>

Markismus commented 2 years ago

No. Such confidence comes from testing, not from design. All extra steps were due to the (bad) OCR or Koreader, not the initial design. So every time things aren't as expected, either the script crashes and I find a new possibility or it slips through.

The main problem was that an entry is a bold word followed by the [pronunciation], However, it can also be followed by an alternate form or by an explanation or by a specification of the kind of word in braces or by an alternate ending. The brackets of [pronunciation] can be missing or odd symbols appears amidst the pronunciation: Basically, it's a messy puzzle.

bousnah commented 2 years ago

Well, in any case, I certainly see the improvement in the headword look-up and maybe some words, unfortunately, will fall "through the cracks". But on the whole, it's a pleasure to be able to use the dictionnary under koreader. It's just not the same thing under pdf. The pdf will be a backup.

I take it then, that there will be no further re-creations and that this project has ended for you?

cordially

On Fri, Oct 21, 2022 at 5:00 AM Markismus @.***> wrote:

No. Such confidence comes from testing, not from design. All extra steps were due to the (bad) OCR or Koreader, not the initial design. So every time things aren't as expected, either the script crashes and I find a new possibility or it slips through.

The main problem was that an entry is a bold word followed by the [pronunciation], However, it can also be followed by an alternate form or by an explanation or by a specification of the kind of word in braces or by an alternate ending. The brackets of [pronunciation] can be missing or odd symbols appears amidst the pronunciation: Basically, it's a messy puzzle.

— Reply to this email directly, view it on GitHub https://github.com/Markismus/PocketBookDic/issues/4#issuecomment-1286605599, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3APZVN42PBDQ2TIPY66OOLWEJERBANCNFSM6AAAAAARDG22LA . You are receiving this because you commented.Message ID: @.***>

Markismus commented 2 years ago

Good. Than I'll close this issue.

bousnah commented 2 years ago

Hello,

I know that you closed the project on this dictionary however, I want to ask you if you would kindly have a re-look at something that I have noticed and which is very important. I came across by accident some headwords that are contained in the definition of another headword. That is, the headword and definition follows the definition of the found headword. Thus, the "buried" headword cannot be searched. Please look at "feudist" where you will see "feuillage" is following the defintion of the headword "feudist". You would be unable to find "feuillage" because it is located in(but after)the definition of "feudist" Also, please look at the headword "guise" and you will find the headword "guitare" following the definition of "guise". You will not find "guitare" on a search for that word. I have come across other instances of these examples. I think a possible solution, and one that would allow all headwords to be found, is to include in the search code that you wrote,some code that would make the search engine also recognise a word in thick,black type as a headword. The search engine would need to scan the lines of defintion for any word in a type face that is a thicker type face. I think only words in thick black type that appear in the text are exclusively headwords. I don't know the point size of the type. This may not be complicated to do provided that change in type face can be recognised. I would hope that, in any case, headwords contained in the definitons can be separated out as this would finally ensure that all headwords searched in this dictionary will indeed be found. very cordially,

Markismus commented 2 years ago

Nice find! This is indeed the same problem as unique had, it was contained in the description of unipolaire, I.I.R.C. I'll extend that test to become true for these keywords, too. If you find more examples, please don't hesitate to post, because each necessary extension will generate more keywords.

All this finetuning comes from the unpredictable step: optical character recognition (OCR). Further training to enhance the results in Finereader mostly shifted the mistakes without real improvement. I am looking into switching from ABBYY Finereader to Tesseract for OCR, which will work better for conversion of Latin and Greek dictionaries. First step would be to convert the Grand Larousse and see how it impacts the code. Would you also be interested in improving the training of the OCR for Tesseract?

bousnah commented 2 years ago

Hello M Markismus,

Glad that you are willing and able to re-look at this issue. I am reasonably certain that words appearing in that special thick type anywhere in the text of this dictionary are indeed headwords with their following definitions. Any search on a thick-typed word would find a headword. There may be italicised words in the text but only the thick black type word is a headword. The examples that I gave-and I forgot another which I failed to note and so it is forgotten-were found strictly by chance and I am sure that there are many others. This is probably why I cannot find certain words that should be there in the dictionary. If you can correct this problem I am sure that all headwords will be found thus making this dictionary truly serviceable and reliable. I certainly would like to contribute to imroving the training of the OCR for Tesseract, however, as I have mentioned several times previously I don't have any coding experience so I am not sure if I can be any help. I certainly do not have your knowledge in these matters! Please alert me when you have revised the dictionary file with the headword search corrections.

cordially

On Sun, Oct 30, 2022 at 6:06 AM Markismus @.***> wrote:

Nice find! This is indeed the same problem as unique had, it was contained in the description of unipolaire, I.I.R.C. I'll extend that test to become true for these keywords, too. If you find more examples, please don't hesitate to post, because each necessary extension will generate more keywords.

All this finetuning comes from the unpredictable step in optical character recognition (OCR). Further training to enhance the results in Finereader mostly shifted the mistakes without real improvement. I'am looking into switching from ABBYY Finereader to Tesseract for OCR, which will work better for conversion of Latin and Greek dictionaries. First step would be to convert the Grand Larousse and see how it impacts the code. Would you also be interested in improving the training of the OCR for Tesseract?

— Reply to this email directly, view it on GitHub https://github.com/Markismus/PocketBookDic/issues/4#issuecomment-1296178024, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3APZVKSD3664ENXTU5RD7TWFYUCBANCNFSM6AAAAAARDG22LA . You are receiving this because you commented.Message ID: @.***>

Markismus commented 2 years ago

Glad that you are willing and able to re-look at this issue. I am reasonably certain that words appearing in that special thick type anywhere in the text of this dictionary are indeed headwords with their following definitions. Any search on a thick-typed word would find a headword. There may be italicised words in the text but only the thick black type word is a headword.

No, you're wrong there. There is a lot of bold text that is a title or just something emphasized. It's quite hard to distinguish them from headwords with multiple forms or endings. So instead I've chosen the restrictive approach to select what is a headword and add to that. However, any typical headword that fails the criterium is put onto a list, so that they can be reviewed. Still, text that is not recognized as a possible headword doesn't make the list.

bousnah commented 2 years ago

Thank you for your message. Well, I thought that the thick type was reserved for headwords; I didn't realise that this in fact is NOT the case. I had some wishful thinking that this would have solved the missing headword issue. It is hard to say how many headwords are "buried" but, hopefully, your solution will uncover most of them.

I look forward to the revised file. And thank you again for your attention to this issue.

cordially

On Mon, Oct 31, 2022 at 8:31 AM Markismus @.***> wrote:

Glad that you are willing and able to re-look at this issue. I am reasonably certain that words appearing in that special thick type anywhere in the text of this dictionary are indeed headwords with their following definitions. Any search on a thick-typed word would find a headword. There may be italicised words in the text but only the thick black type word is a headword.

No, you're wrong there. There is a lot of bold text that is a title or just something emphasized. It's quite hard to distinguish them from headwords with multiple forms or endings. So instead I've chosen the restrictive approach to select what is a headword and add to that. However, any typical headword that fails the criterium is put onto a list, so that they can be reviewed. Still, text that is not recognized as a possible headword doesn't make the list.

— Reply to this email directly, view it on GitHub https://github.com/Markismus/PocketBookDic/issues/4#issuecomment-1296955317, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3APZVPRXPFZPAJSFNA3MODWF6UXZANCNFSM6AAAAAARDG22LA . You are receiving this because you commented.Message ID: @.***>

Markismus commented 2 years ago

Apparently, ABBYY sometimes mistakes a '.' for a ',' and concludes that the paragraph isn't ended. Looking for a ',' followed by a bold text followed by whatever validates it as a keyword, we've found about 2000 new words.

bousnah commented 2 years ago

Thank you for the message and WOW, that is a lot of headwords that you have recuperated! Nice work, thank you. Would you be close to having a re-created dict file?

cordially

On Tue, Nov 1, 2022 at 5:42 PM Markismus @.***> wrote:

Apparently, ABBYY sometimes mistakes a '.' for a ',' and concludes that the paragraph isn't ended. Looking for a ',' followed by a bold text followed by whatever validates it as a keyword, we've found about 2000 new words.

https://user-images.githubusercontent.com/5269101/199346361-4a21ea83-c332-44f0-9f0a-ecfd8ae60b71.png

— Reply to this email directly, view it on GitHub https://github.com/Markismus/PocketBookDic/issues/4#issuecomment-1299243278, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3APZVLOP47NMQK466TBIRDWGGFDTANCNFSM6AAAAAARDG22LA . You are receiving this because you commented.Message ID: @.***>

Markismus commented 2 years ago

It's been on pCloud since yesterday evening.

bousnah commented 2 years ago

Ok, I am going to go fetch it! Let you know how it goes.

Thank you.

cordially,

On Wed, Nov 2, 2022 at 5:31 AM Markismus @.***> wrote:

It's been on pCloud since yesterday evening.

— Reply to this email directly, view it on GitHub https://github.com/Markismus/PocketBookDic/issues/4#issuecomment-1299936003, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3APZVLRBBJSA4DUESE3FIDWGIYHNANCNFSM6AAAAAARDG22LA . You are receiving this because you commented.Message ID: @.***>

bousnah commented 2 years ago

Hello, Installed the file and it is good to think that at least 2000 words,if not more, are now available. Hated to think that these words were locked out from a search but your modification has hopefully found the rest of the head words that might have gone missing in the dict.

I would,briefly,like to return to the issue of "scrolling" a dictionary. Any dictionary. I think that dictionaries should be treated like any book in koreader. I am wondering if you might not apply the koreader handling of books to the dictionaries and allow the reader to "read" a dictionary like any book under koreader. This would not necessarily require a change to formatting koreader but rather a "silent" change to the code that would permit the reader to use a dictionary like any book. Thus, rather than terminating at the end of definitions one could continue in the dictionary until the dictionary advance arrow is pressed which moves the reader to the next dictionary. As it stands, you are tapping the screen to move forward in the definition. This would not change but there would be no termination, the following headword would appear and the reader can decide to stop there and move to the next dictionary or continue perusing the dictionary. One should also be able to move backwards from a searched headword as well-as is the case in any book under koreader. So I think,if I am not completely off the mark here, that the koreader code might be applied to the dictionnaries being read like any book without major code rewrites. If you have ever used the Oxford English Dictionary,for example, you can appreciate what I mean about the pleasure of reading in a dictionary. And there are many people that enjoy reading dictionnaries. I think that this is an important missing function in koreader and would only enhance the reader's experience of using an e-reader under koreader.. I think you said that you had a hand in the koreader development so I thought that you might have a look at this. I certainly would like participating in any "beta" testing of this function. I had mentioned that Sony already had this feature and I found it rewarding to be able to make use of it when my Sony was my only e-reader, pre-koreader days. Plea

On Wed, Nov 2, 2022 at 5:31 AM Markismus @.***> wrote:

It's been on pCloud since yesterday evening.

— Reply to this email directly, view it on GitHub https://github.com/Markismus/PocketBookDic/issues/4#issuecomment-1299936003, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3APZVLRBBJSA4DUESE3FIDWGIYHNANCNFSM6AAAAAARDG22LA . You are receiving this because you commented.Message ID: @.***>

bousnah commented 2 years ago

Hello, Installed the file and it is good to think that at least 2000 words,if not more, are now available. Hated to think that these words were locked out from a search but your modification has hopefully found the rest of the head words that might have gone missing in the dict.

I would,briefly,like to return again to the issue of "scrolling" a dictionary. Any dictionary. I think that dictionaries should be treated like any book in koreader. I am wondering if you might not apply the koreader handling of books to the dictionaries and allow the reader to "read" a dictionary like any book under koreader. This would not necessarily require a re-formatting of koreader and putting new arrows and so forth but rather a "silent" change to the code that would permit the reader to use a dictionary like any book. Thus, in dictionary mode(while reading any book or explicitely searching a dictionary) rather than terminating at the end of definitions one could continue in the dictionary until the dictionary advance arrow(which is always there) is pressed which moves the reader to the next dictionary. As it stands, you are tapping the screen to move forward or backwards in the definition. This would not change but there would be no termination, the following headword would appear if one continues tapping(or swiping) and the reader can decide to stop there and move to the next dictionary or continue perusing the dictionary. One should also be able to move backwards from a searched headword as well-as is the case in any book under koreader. So I think,if I am not completely off the mark here, that the koreader code might be applied to the dictionnaries being read like any book without major code rewrites. If you have ever used the Oxford English Dictionary,for example, you can appreciate what I mean about the pleasure of reading in a dictionary. And there are many people that enjoy reading dictionnaries. I think that this is an important missing function in koreader and would only enhance the reader's experience of using an e-reader under koreader.. I think you said that you had a hand in the koreader development so I thought that you might have a look at this. I certainly would like participating in any "beta" testing of this function. I had mentioned that Sony already had this feature and I found it rewarding to be able to make use of it when my Sony was my only e-reader, pre-koreader days. Please have a look at this. cordially

On Wed, Nov 2, 2022 at 5:31 AM Markismus @.***> wrote:

It's been on pCloud since yesterday evening.

— Reply to this email directly, view it on GitHub https://github.com/Markismus/PocketBookDic/issues/4#issuecomment-1299936003, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3APZVLRBBJSA4DUESE3FIDWGIYHNANCNFSM6AAAAAARDG22LA . You are receiving this because you commented.Message ID: @.***>

Markismus commented 2 years ago

However, a binary dictionary is not a book. Some dictionary formats can be treated as a book. For instance, the old mobi-format could be read as a book. Maybe even the Kindle awz- and awz3-format could be read that way, too. Not by every reader, surely, but at least by mobi-reader.

Your best bet is the XDXF-files or any XML-format files. I've uploaded the XDXF-files to pCloud, but you can convert any Stardict binary dictionary to its text format, which is in XML. These should be easily converted to HTML-format first and then to ePub. The last step can be done with Calibre, but for the first step you should Google yourself or start here.

I am personally not interested in writing a XML-display plugin for Koreader.

bousnah commented 2 years ago

Thank you for your message and for setting me straight about the dictionary file structure; I don't have any knowledge in this area. I am not able to manipulate the files as you have suggested.

I have not found Calibre reliable for epub conversion so that would not work for me. Too bad that no one in the community has come up with a browsing function for dictionaries under koreader. Perhaps for lack of interest or no one has suggested it before. But,oh well, Sony had it and must have thought it important enough to provide it.

very cordially,

On Thu, Nov 3, 2022 at 2:13 PM Markismus @.***> wrote:

However, a binary dictionary is not a book. Some dicionary formats can be treated as a book. For instance, the old mobi-format could be read as a book. Maybe even the Kindle awz- and awz3-format could be read that way, too. Not by every reader, surely, but at least by mobi-reader.

Your best bet is the XDXF-files or any XML-format files. I've uploaded the XDXF-files to pCloud, but you can convert any Stardict binary dictionary to its text format, which is in XML. These should be easily converted to HTML-format first and then to ePub. The last step can be done with Calibre, but for the first step you should Google yourself.

I am personally not interested in writing a XML-display plugin for Koreader.

— Reply to this email directly, view it on GitHub https://github.com/Markismus/PocketBookDic/issues/4#issuecomment-1302495894, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3APZVPGOTK3JUYXBGSD4FTWGP6DRANCNFSM6AAAAAARDG22LA . You are receiving this because you commented.Message ID: @.***>

bousnah commented 2 years ago

Having just responded to your message I thought, upon reflection,to ask if there would really be a need to convert dictionary files. We stay with the stardict files; koreader has a search protocol when it searches headwords in the stardict formated dictionaries. It knows to terminate at the end of a definition of a word. What determines this termination? I imagine that it is another headword so that there is an "enddo" when it encounters the following(unsearched)headword. I am suggesting that the termination is open ended. Let the end of the dictionary be the termination, the "enddo"(I am using and thinking in terms of old Dbase 2 code that I knew how to code,years ago)rather than the end of the definition of the particular word. So that, reaching the end of a definition(unwinding the database) only leads to the next headword and so on. The reader can "browse" or exit or move on to the next dictionary if desired,as is currently done now in koreader. Each dictionary behaves the same. It is a modification,and I think a slight modification, to the koreader search code and not to the dictionaries. We're leaving the stardict format alone, not touching the dictionaries, and the general dictionary function under koreader, how it looks, except for just expanding within a search. In a sense, the whole dictionary becomes part of the definition, or at least that part of the dictionary that follows the found headword. Graphically, the reader will see the next headword indicating the end of the previous definition.

cordially

On Thu, Nov 3, 2022 at 2:13 PM Markismus @.***> wrote:

However, a binary dictionary is not a book. Some dicionary formats can be treated as a book. For instance, the old mobi-format could be read as a book. Maybe even the Kindle awz- and awz3-format could be read that way, too. Not by every reader, surely, but at least by mobi-reader.

Your best bet is the XDXF-files or any XML-format files. I've uploaded the XDXF-files to pCloud, but you can convert any Stardict binary dictionary to its text format, which is in XML. These should be easily converted to HTML-format first and then to ePub. The last step can be done with Calibre, but for the first step you should Google yourself.

I am personally not interested in writing a XML-display plugin for Koreader.

— Reply to this email directly, view it on GitHub https://github.com/Markismus/PocketBookDic/issues/4#issuecomment-1302495894, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3APZVPGOTK3JUYXBGSD4FTWGP6DRANCNFSM6AAAAAARDG22LA . You are receiving this because you commented.Message ID: @.***>

bousnah commented 2 years ago

Hello, I had installed goldendict in Linux and was installing the dictionary. Searching on "que" just gave me " que" listed 4 times without definitions. This is probably the one headword commanding the longest definition. I thought it was a problem with the goldendict linux install until I checked it on my ereader and koreader and saw the same thing.

Can't understand why the the headword is found but no definitions. Would you kindly have a look at this. Cordially,

bousnah commented 2 years ago

On the heels of the message just sent to you I should have checked the pdf file of the dictionary and I see that it is one of those words prefixed by "1." and so one. Koreader found the headword "que" and listed the 4 instances of "que" with the number prefixed, however, no definitions are included.

cordially,

bousnah commented 2 years ago

Please also note that "venir", without a number pre-fix, is not found. Another important headword with a long definition.

cordially

On Tue, Nov 8, 2022 at 6:38 PM dennis michael kahn @.***> wrote:

On the heels of the message just sent to you I should have checked the pdf file of the dictionary and I see that it is one of those words prefixed by "1." and so one. Koreader found the headword "que" and listed the 4 instances of "que" with the number prefixed, however, no definitions are included.

cordially,

Markismus commented 2 years ago

I can reproduce your results:

[mark@debiel PocketbookDic]$ sdcv venir
Found 11 items, similar to venir.
0)TableTest-->tenir
1)Grand Larousse 1989-->tenir
2)Grand Larousse 1989-->vendre
3)Grand Larousse 1989-->vener
4)Oxford Dictionary of English 2ndEd 2010-->venery
5)Oxford Dictionary of English 2ndEd 2010-->venial
6)Grand Larousse 1989-->veniat
7)Grand Larousse 1989-->venin
8)Grand Larousse 1989-->Venise
9)Grand Larousse 1989-->ventre
10)Grand Larousse 1989-->ventru
Your choice[-1 to abort]: 
[mark@debiel PocketbookDic]$ sdcv que
Found 1 items, similar to que.
-->Grand Larousse 1989
-->que

<sup>i.</sup><p><span class="font4" style="font-weight:bold;">1. que </span><span class="font29">[ka].</span>
<sup>ii.</sup><p><span class="font4" style="font-weight:bold;">2. que </span><span class="font29">[ko].</span>
<sup>iii.</sup><p><span class="font4" style="font-weight:bold;">3. que </span><span class="font29">[ko].</span>
<sup>iv.</sup><p><span class="font4" style="font-weight:bold;">4. que </span><span class="font29">[ko].</span>

[mark@debiel PocketbookDic]$ 

venir is contained in the description of venin.

Markismus commented 2 years ago
<span class="font29"> (Gide).<br></span>
<span class="font4" style="font-variant:small-caps;">• Syn.</span>
<span class="font20" style="font-weight:bold;"> : 3 </span>
<span class="font29" style="font-style:italic;">calomnie, fiel; 4 noirceur,perfidie.<br></span>
<span class="font4" style="font-weight:bold;">venir </span>
<span class="font29">[vonir] v. intr. (lat. </span>
<span class="font29" style="font-style:italic;">ventre,</span>
<span class="font29"> venir [dans<br>l’espace ou le temps], se présenter, parve-<br>nir à, en venir à ; fin du ix<sup>e</sup> s., </span>
<span class="font29" style="font-style:italic;">Cantilène<br>de sainte Eulalie,</span><span class="font29"> au sens I, 1 [aussi </span>
Markismus commented 2 years ago

I removed the breaks in the sub asHTML, which shouldn't have happened. Wordcount is increased 77568 -> 80045.

$ sdcv venir
Found 1 items, similar to venir.
-->Grand Larousse 1989
-->venir

<span class="font4" style="font-weight:bold;">venir </span><span class="font29">[vonir] v. intr. (lat. </span><span class="font29" style="font-style:italic;">ventre,</span><span class="font29"> venir [dans l’espace ou le temps], se présenter, parvenir à, en venir à ; fin du ix<sup>e</sup> s., </span><span class="font29" style="font-style:italic;">Cantilène de sainte Eulalie,</span><span class="font29"> au sens I, 1 [aussi </span><span class="font29" style="font-style:italic;">venir à quelqu’un ;</span><span class="font29"> impers., xn<sup>e</sup> s. </span><span class="font29" style="font-style:italic;">; faire venir quelqu’un,</span><span class="font29"> 1549, R. Estienne ; </span><span class="font29" style="font-style:italic;">faire venir quelque chose,</span><span class="font29"> av. 1654, Guez de Balzac ; </span><span class="font29" style="font-style:italic;">venir... au secours de quelqu’un,</span><span class="font29"> 1538, d’après le </span><span class="font29" style="font-style:italic;">FEW,</span><span class="font29"> XII, 383 </span><span class="font29" style="font-style:italic;">b ; venir en aide à quelqu’un,</span><span class="font29"> 1863, Littré, art. </span><span class="font29" style="font-style:italic;">aide ; venir avec quelqu’un,</span><span class="font29"> xn<sup>e</sup> s., </span><span class="font29" style="font-style:italic;">Roncevaux ; aller et venir,</span><span class="font29"> xn<sup>e</sup> s., d’après le </span><span class="font29" style="font-style:italic;">FEW,</span><span class="font29"> XIV, 241 </span><span class="font29" style="font-style:italic;">b ; ne faire qu’aller et venir,</span><span class="font29"> 1671, Pomey —aussi « s’éloigner... et revenir très vite » ; </span><span class="font29" style="font-style:italic;">s’en aller comme on est venu,</span><span class="font29"> v. 1675, La Fontaine ; </span><span class="font29" style="font-style:italic;">ça vient ?,</span><span class="font29"> 1964, Larousse </span><span class="font29" style="font-style:italic;">;y venir,</span><span class="font29"> 1798, Acad.] ; sens 1,2, v. 1050, </span><span class="font29" style="font-style:italic;">Vie de saint Alexis</span><span class="font29"> [« être bien accueilli », v. 1175, Chr. de Troyes] ; sens I, 3, xn<sup>e</sup> s. ; sens 1,4, 1904, Larousse [d’abord ... </span><span class="font29" style="font-style:italic;">à bâbord, à tribord,</span><span class="font29"> 1876, Larousse ; </span><span class="font29" style="font-style:italic;">venir au vent..., </span><span class="font29">1872, Littré] ; sens I, 5, 1690, Furetière ; sens I, 6, v. 1167, Gautier d’Arras ; sens I, 7, v. 1770, Diderot [en parlant de l’image photographique, 1872, Littré] ; sens I, 8, fin du xii<sup>e</sup> s., Châtelain de Coucy [aussi impers.] ; sens I, 9, v. 1175, Chr. de Troyes[« être apporté de... », 1606, d’après le </span><span class="font29" style="font-style:italic;">FE W, </span><span class="font29">XIV, 240 </span><span class="font29" style="font-style:italic;">b ; « ...</span><span class="font29"> tirer son ascendance de », XV<sup>e</sup> s., Du Cange ; </span><span class="font29" style="font-style:italic;">venir de haut lieu,</span><span class="font29"> 1876, Larousse — d’abord... </span><span class="font29" style="font-style:italic;">de bon lieu,</span><span class="font29"> av. 1696, La Bruyère ; </span><span class="font29" style="font-style:italic;">venir de bas lieu,</span><span class="font29"> 1872, Littré] ; sens I, 10, v. 1175, Chr. de Troyes [« tirer étymologiquement son origine de », 1606, d’après le </span><span class="font29" style="font-style:italic;">FEW,</span><span class="font29"> XIV, 240 </span><span class="font29" style="font-style:italic;">b]</span><span class="font29"> ; sens 1,11,1655, Pascal </span><span class="font29" style="font-style:italic;">[de là vient que] ;</span><span class="font29"> sens II, 1, v. 980, </span><span class="font29" style="font-style:italic;">Passion du Christ [... qui vient,</span><span class="font29"> « ... qui suit immédiatement le moment où on parle », av. 1549, Marguerite de Navarre ; </span><span class="font29" style="font-style:italic;">venir après,</span><span class="font29"> v. 1791, C. Desmoulins — dans une classification logique ou conventionnelle, av. 1778, Voltaire] ; sens II, 2,1275, Adenet </span><span class="font29" style="font-style:italic;">[venir;</span><span class="font29"> impers., 1690, Furetière] ; sens II, 3, 1560, </span><span class="font29" style="font-style:italic;">Bible Rebul [un enfant qui vient bien, </span><span class="font29">1690, Furetière ; « prendre forme... », au fig., xn<sup>e</sup> s., </span><span class="font29" style="font-style:italic;">Roncevaux] ;</span><span class="font29"> sens II, 4, 1672, Sacy [au fig., xn<sup>e</sup> s., </span><span class="font29" style="font-style:italic;">Roncevaux]</span><span class="font29"> ; sens II, 5, v. 980, </span><span class="font29" style="font-style:italic;">Passion du Christ [laisser venir,</span><span class="font29"> 1798, Acad. ; </span><span class="font29" style="font-style:italic;">tout vient à point à qui sait attendre, </span><span class="font29">1868, Littré, </span><span class="font29" style="font-style:italic;">art.point</span><span class="font29"> 1 — d’abord... </span><span class="font29" style="font-style:italic;">à qui peut attendre,</span><span class="font29"> 1690, Furetière (sans </span><span class="font29" style="font-style:italic;">à</span><span class="font29"> devant </span><span class="font29" style="font-style:italic;">qui,</span><span class="font29"> 1552, d’après le </span><span class="font29" style="font-style:italic;">FEW,</span><span class="font29"> IX, 588 </span><span class="font29" style="font-style:italic;">b)] ; </span><span class="font29">sens II, 6, v. 1360, Froissart [d’abord écrit </span><span class="font29" style="font-style:italic;">advenir,</span><span class="font29"> 1295, Runkewitz] ; sens II, 7, 1656, Brébeuf ; sens III, 1, 1080, </span><span class="font29" style="font-style:italic;">Chanson de Roland [venir à bien,</span><span class="font29"> xm<sup>e</sup> s., Rutebeuf ; </span><span class="font29" style="font-style:italic;">venir à bout de quelque chose,</span><span class="font29"> fin du XIV<sup>e</sup> s., E. Deschamps] ; sens III, 2, fin du xn<sup>e</sup> s., Châtelain de Coucy </span><span class="font29" style="font-style:italic;">[venir à — en venir à, </span><span class="font29">1690, Bossuet ; </span><span class="font29" style="font-style:italic;">en venir à</span><span class="font29"> et l’infin., 1640, Corneille — </span><span class="font29" style="font-style:italic;">venir à,</span><span class="font29"> 1549, R. Estienne] ; sens III, 3, 1080, </span><span class="font29" style="font-style:italic;">Chanson de Roland ; </span><span class="font29">sens III, 4, v. LÀ ; sens III, 5, 1665, La Fontaine ; sens IV, 1,1080, </span><span class="font29" style="font-style:italic;">Chanson de Roland</span><span class="font29"> [pour marquer une... intervention... fortuite, av. 1526, J. Marot ; pour insister... sur l’action, fin du xiT s., Châtelain de Coucy] ; sens IV, 2, 1553, </span><span class="font29" style="font-style:italic;">Bible Gérard ; </span><span class="font29">sens IV, 3,1580, Montaigne).</span></p><p><span class="font4" style="font-weight:bold;">I. </span><span class="font4" style="font-variant:small-caps;">Sens spatial.</span><span class="font4" style="font-weight:bold;"> 1. </span><span class="font29">En parlant d’un être animé ou d’un véhicule, se déplacer dans la direction du locuteur ou de l’interlocuteur : </span><span class="font29" style="font-style:italic;">Je l’ai appelée, elle est venue </span><span class="font29">(Claudel). </span><span class="font29" style="font-style:italic;">Alors c’est toi qui viendras à Argelouse ?</span><span class="font29"> (Mauriac). </span><span class="font29" style="font-style:italic;">Il se leva, vint s’asseoir près d’elle et lui prit la main</span><span class="font29"> (Sartre). </span><span class="font29" style="font-style:italic;">Venez donc chez moi un instant. Ils viendront dimanche dans notre maison de campagne. La voiture vint droit sur moi. Le bateau vient à quai ;</span><span class="font29"> et impers. : </span><span class="font29" style="font-style:italic;">Il est venu quelqu’un pour toi cette après-midi.|| Faire venir quelqu’un,</span><span class="font29"> lui demander de se rendre auprès de soi ; le mander : </span><span class="font29" style="font-style:italic;">Faire venir le médecin, le géomètre. || Faire venir quelque chose,</span><span class="font29"> le faire apporter, le faire livrer : </span><span class="font29" style="font-style:italic;">Faire venir son vin de la Bourgogne. || Venir à quelqu’un,</span><span class="font29"> s’approcher de lui : </span><span class="font29" style="font-style:italic;">La chèvre vient à l’homme avec confiance</span><span class="font29"> (Buffon). </span><span class="font29" style="font-style:italic;">Venir à lui, c’était venir sur lui</span><span class="font29"> (Hugo). </span><span class="font29" style="font-style:italic;">Le Christ disait : « Laissez venir à moi les petits enfants. » || Venir à l’aide, au secours de quelqu’un,</span><span class="font29"> l’aider, le secourir matériellement. || </span><span class="font29" style="font-style:italic;">Venir en aide à quelqu’un,</span><span class="font29"> lui apporter une aide financière ou morale. || </span><span class="font29" style="font-style:italic;">Venir avec quelqu’un, </span><span class="font29">l’accompagner : </span><span class="font29" style="font-style:italic;">Si vous ne savez pas où aller, venez donc avec nous ! || Aller et venir,</span><span class="font29"> ou </span><span class="font29" style="font-style:italic;">aller, venir,</span><span class="font29"> v. </span><span class="font4" style="font-variant:small-caps;">aller. || </span><span class="font29" style="font-style:italic;">Ne faire qu’aller et venir,</span><span class="font29"> s’agiter vainement, être toujours en mouvement </span><span class="font29" style="font-style:italic;">;fam.,</span><span class="font29"> s’éloigner pour peu de temps et revenir très vite. </span><span class="font29" style="font-style:italic;">|| S’en aller comme on est venu,</span><span class="font29"> se retirer, partir ou mourir sans avoir rien acquis, sans avoir retiré aucun bénéfice personnel, ou sans avoir apporté aucune amélioration, aucun changement à la situation.|| Fam. </span><span class="font29" style="font-style:italic;">Ça vient</span><span class="font29"> ? dépêche-toi, dépêchezvous. || Fam. et fig. </span><span class="font29" style="font-style:italic;">Voir venir quelqu’un, </span><span class="font29">v. </span><span class="font4" style="font-variant:small-caps;">voir.</span><span class="font29"> || Fam. </span><span class="font29" style="font-style:italic;">Venir aux oreilles de quelqu’un,</span><span class="font29"> v. </span><span class="font4" style="font-variant:small-caps;">oreille</span><span class="font29"> (§ I, n. 2). || Pop. </span><span class="font29" style="font-style:italic;">Y venir,</span><span class="font29"> se risquer à l’attaque (formule de défi) : </span><span class="font29" style="font-style:italic;">Viens-y pour voir ! Qu’il y vienne !|| A beau mentir qui vient de</span><span class="font29"> Zom(prov.), v. </span><span class="font4" style="font-variant:small-caps;">loin.</span><span class="font20" style="font-weight:bold;"> || 2. </span><span class="font29">Class. </span><span class="font29" style="font-style:italic;">Venir bien,</span><span class="font29"> convenir : </span><span class="font29" style="font-style:italic;">Un peu de son esprit | Nous viendrait bien pour polir chaque écrit</span><span class="font29"> (La Fontaine) ; être bien accueilli : </span><span class="font29" style="font-style:italic;">Voilà un jeune gentilhomme qui vient bien dans le monde </span><span class="font29">(Molière). || Auj. </span><span class="font29" style="font-style:italic;">Se faire bien venir,</span><span class="font29"> v. </span><span class="font4" style="font-variant:small-caps;">bienvenir</span><span class="font29"> à l’ordre alphab. || 3. Aller souvent, habituellement chez quelqu’un, fréquenter : </span><span class="font29" style="font-style:italic;">Mon père mort, elle commença à venir aussi chez nous</span><span class="font29"> (Van der Meersch). </span><span class="font29" style="font-style:italic;">Ce retraité vient tous les dimanches au cercle.</span><span class="font20" style="font-weight:bold;"> || 4. </span><span class="font29" style="font-style:italic;">Venir sur bâbord, sur tribord,</span><span class="font29"> diriger le navire de manière qu’il se dirige sur la gauche, sur la droite. </span><span class="font29" style="font-style:italic;">|| Venir au vent, au lof,</span><span class="font29"> gouverner plus près du vent qu’on ne le faisait. || 5. Arriver jusqu’à un certain niveau, atteindre une certaine limite : À </span><span class="font29" style="font-style:italic;">marée basse la mer vient jusqu’ici. Ce petit homme me vient juste à l’épaule. Le terrain vient jusqu’à cette borne.</span><span class="font20" style="font-weight:bold;"> || 6. </span><span class="font29">En parlant de liquides, parvenir jusqu’à un orifice et s’écouler au-dehors : </span><span class="font29" style="font-style:italic;">L’eau ne venait plus au robinet</span><span class="font29"> (Giono). </span><span class="font29" style="font-style:italic;">Des larmes me vinrent aux yeux</span><span class="font29"> (Van der Meersch). </span><span class="font29" style="font-style:italic;">Quand on presse un abcès, le pus vient tout doucement.|| Faire venir l’eau à la bouche, les larmes aux yeux,</span><span class="font29"> v. </span><span class="font4" style="font-variant:small-caps;">eau</span><span class="font29"> (§ m, n. 1), </span><span class="font4" style="font-variant:small-caps;">larme. </span><span class="font29">Il </span><span class="font20" style="font-weight:bold;">7. </span><span class="font29" style="font-style:italic;">Venir bien, venir mal,</span><span class="font29"> sortir bien tiré, mal tiré de la presse, en parlant d’une feuille, d’une estampe, d’une épreuve ; se présenter bien, mal, au cours du développement, en parlant de l’image photographique : </span><span class="font29" style="font-style:italic;">Les grandes capitales viennent mal dans cette page. Les noirs viennent trop lentement sur cette photo.</span><span class="font20" style="font-weight:bold;"> || 8. </span><span class="font29">Fig. </span><span class="font29" style="font-style:italic;">Venir à quelqu’un, à l’idée, à l’esprit de quelqu’un,</span><span class="font29"> ou, absol., </span><span class="font29" style="font-style:italic;">venir,</span><span class="font29"> apparaître dans l’esprit ; être conçu : </span><span class="font29" style="font-style:italic;">Le mouvement et le rythme me viennent en vers</span><span class="font29"> (Renan). </span><span class="font29" style="font-style:italic;">Ce matin, travaillé à mon livre avec ces éternelles difficultés : les idées viennent, les mots refusent de se montrer</span><span class="font29"> (Green). </span><span class="font29" style="font-style:italic;">Franchement, il ne me serait jamais venu à l’idée de m’adresser à lui pour ce genre de chose. Cette pensée lui était venue brusquement à l’esprit;</span><span class="font29"> et impers. : </span><span class="font29" style="font-style:italic;">Il me vient l’idée de vous interroger sur ce mystère. </span><span class="font20" style="font-weight:bold;">|| 9. </span><span class="font29" style="font-style:italic;">Venir de,</span><span class="font29"> arriver en provenance de : </span><span class="font29" style="font-style:italic;">Il est venu de Paris en quelques heures. De quel côté vient le vent ;</span><span class="font29"> en parlant d’une chose, être apporté de, provenir par héritage de : </span><span class="font29" style="font-style:italic;">Ce thé vient de Chine. Un bijou qui nous vient d’une lointaine aïeule ;</span><span class="font29"> en parlant d’une personne, être originaire de ; tirer son ascendance de : </span><span class="font29" style="font-style:italic;">On croit que les Gitans viennent des Indes. Notre ami vient d’une vieille souche normande. Elle vient de la petite bourgeoisie de province, d’une famille relativement aisée.</span><span class="font29"> || Vx.</span></p><p><span class="font29" style="font-style:italic;">Venir de haut lieu, de bas lieu,</span><span class="font29"> avoir une haute, une basse naissance. || </span><span class="font20" style="font-weight:bold;">10. </span><span class="font29">Fig. </span><span class="font29" style="font-style:italic;">Venir de,</span><span class="font29"> avoir son point de départ, son origine, sa cause dans : </span><span class="font29" style="font-style:italic;">Elle</span><span class="font29"> [la liberté] </span><span class="font29" style="font-style:italic;">vient du droit naturel</span><span class="font29"> (Chateaubriand). </span><span class="font29" style="font-style:italic;">D’où viendrait la conscience, si elle pouvait« venir » de quelque chose ? Des limbes de l’inconscient ou du physiologique</span><span class="font29"> (Sartre). </span><span class="font29" style="font-style:italic;">Tout cela vient de ce que vous ne savez pas garder un secret ; spécialem.,</span><span class="font29"> tirer étymologiquement son origine de : </span><span class="font29" style="font-style:italic;">Mot qui vient du latin, du grec.</span><span class="font20" style="font-weight:bold;"> || 11. </span><span class="font29" style="font-style:italic;">De là vient que, d’où vient que</span><span class="font29"> (suivi de l’indicatif), c’est pourquoi : </span><span class="font29" style="font-style:italic;">Les voisins ont enfin déménagé. D’où vient que nous sommes à présent si tranquilles.</span></p><p><span class="font20" style="font-weight:bold;">IL </span><span class="font4" style="font-variant:small-caps;">Sens temporel.</span><span class="font20" style="font-weight:bold;"> 1. </span><span class="font29">Se présenter à un certain moment dans la succession temporelle : </span><span class="font29" style="font-style:italic;">Il viendra</span><span class="font29"> [l’Antéchrist] </span><span class="font29" style="font-style:italic;">quand viendront les dernières ténèbres </span><span class="font29">(Hugo). </span><span class="font29" style="font-style:italic;">Un malheur ne vient jamais seul.|| Le jour, l’année,</span><span class="font29"> etc., </span><span class="font29" style="font-style:italic;">qui vient,</span><span class="font29"> le jour, l’année, etc., qui suit immédiatement le moment où on parle. || </span><span class="font29" style="font-style:italic;">L’appétit vient en mangeant, la fortune vient en dormant, </span><span class="font27" style="font-weight:bold;">v. APPÉTIT, DORMIR 1. || </span><span class="font29" style="font-style:italic;">Venir après,</span><span class="font27" style="font-weight:bold;"> se </span><span class="font29">classer à un moment postérieur dans le temps : </span><span class="font29" style="font-style:italic;">Fais ton travail, les jeux viendront après ;</span><span class="font29"> dans une classification logique ou conventionnelle, succéder à : </span><span class="font29" style="font-style:italic;">La première partie vient après une brève introduction. </span><span class="font29">À </span><span class="font29" style="font-style:italic;">l’écarté, l’as vient après le valet.</span><span class="font20" style="font-weight:bold;"> || 2. </span><span class="font29" style="font-style:italic;">Venir au jour, au monde,</span><span class="font29"> (vx) </span><span class="font29" style="font-style:italic;">à la lumière </span><span class="font29">ou simplem. </span><span class="font29" style="font-style:italic;">venir,</span><span class="font29"> naître : </span><span class="font29" style="font-style:italic;">Dire que je suis entré dans le monde, « venu au monde », ou qu’il y a un monde ou que j’ai un corps, c’est une seule et meme chose</span><span class="font29"> (Sartre). </span><span class="font29" style="font-style:italic;">Tu es venue grosse comme le poing</span><span class="font29"> (Van der Meersch). </span><span class="font29" style="font-style:italic;">La fille de mon frère vint quand j’avais huit ans</span><span class="font29"> (Colette) ; et impers. : </span><span class="font29" style="font-style:italic;">À force de prier Dieu il lui vint une fille </span><span class="font29">(Flaubert). || 3. Pousser, croître : </span><span class="font29" style="font-style:italic;">Et l’oiseau des bois vit aussi, et les chenilles sous la feuille, et le genêt qui vient dans les grès </span><span class="font29">(Claudel). </span><span class="font29" style="font-style:italic;">Le pin vient bien sur les terrains sableux. || Un enfant qui vient bien, </span><span class="font29">v. </span><span class="font4" style="font-variant:small-caps;">bien</span><span class="font29"> 1. || </span><span class="font29" style="font-style:italic;">Fig.</span><span class="font29"> Prendre forme, atteindre un certain développement : </span><span class="font29" style="font-style:italic;">L’affaire commence à bien venir. Alors, ça vient, ce roman</span><span class="font20" style="font-weight:bold;"> ? || 4. </span><span class="font29">Se former, naître et apparaître à la vue : </span><span class="font29" style="font-style:italic;">Des furoncles lui sont venus au visage. La fine pellicule qui vient sur le thé refroidi. || Fig.</span><span class="font29"> Apparaître et se constituer peu à peu : </span><span class="font29" style="font-style:italic;">La sagesse vient avec l’âge. </span><span class="font20" style="font-weight:bold;">|| 5. </span><span class="font29">Arriver, survenir, en parlant d’un événement de caractère ponctuel : </span><span class="font29" style="font-style:italic;">À cinquante ans, l’heure étant venue, je vendis tout</span><span class="font29"> (Gide). </span><span class="font29" style="font-style:italic;">La nuit était tout à fait venue </span><span class="font29">(Vercel). </span><span class="font29" style="font-style:italic;">Et puis cette putain de guerre est venue, et c’est comme si la vie avait perdu d’un seul coup son épaisseur</span><span class="font29"> (Merle). </span><span class="font29" style="font-style:italic;">|| Laisser venir,</span><span class="font29"> demeurer dans une prudente expectative, en laissant les choses se préciser peu à peu d’elles-mêmes. </span><span class="font29" style="font-style:italic;">|| Voir venir,</span><span class="font29"> v. </span><span class="font4" style="font-variant:small-caps;">voir. || </span><span class="font29" style="font-style:italic;">Venir à son heure, </span><span class="font29">v. </span><span class="font4" style="font-variant:small-caps;">heure</span><span class="font29"> (§ II, n. 4). || </span><span class="font29" style="font-style:italic;">Tout vient à point(à) qui sait attendre,</span><span class="font29"> avec du temps et de la patience on réussit, on obtient ce qu’on désire. || Fam. </span><span class="font29" style="font-style:italic;">Venir comme un cheveu </span><span class="font29">(ou </span><span class="font29" style="font-style:italic;">des cheveux) sur la soupe,</span><span class="font29"> v. </span><span class="font4" style="font-variant:small-caps;">cheveu. </span><span class="font20" style="font-weight:bold;">|| 6. </span><span class="font29">À </span><span class="font29" style="font-style:italic;">venir,</span><span class="font29"> qui doit arriver, se produire ; futur : </span><span class="font29" style="font-style:italic;">Nous ne sommes rien, Myrtil, que dans l’instantané de la vie ; tout le passé s’y meurt avant que rien d’à venir y soit né </span><span class="font29">(Gide). </span><span class="font29" style="font-style:italic;">Les siècles à venir. Notre existence à venir.</span><span class="font20" style="font-weight:bold;"> || 7. </span><span class="font29">Parvenir, subsister jusqu’à un moment donné (en général le présent que nous vivons) : </span><span class="font29" style="font-style:italic;">C’est par Théophraste que sont venus jusqu’à nous les ouvrages du philosophe</span><span class="font29"> [Aristote] (La Bruyère).</span></p><p><span class="font20" style="font-weight:bold;">III. </span><span class="font4" style="font-variant:small-caps;">Idée d’aboutissement.</span><span class="font20" style="font-weight:bold;"> 1. </span><span class="font29">Atteindre un résultat conforme à la nature, ou qui répond aux efforts qu’on a déployés : </span><span class="font29" style="font-style:italic;">Les dernières fleurs viennent à fruit plus souvent que les premières</span><span class="font29"> (Gide). </span><span class="font29" style="font-style:italic;">Ces tomates sont venues à maturité. Ce chercheur obstiné est venu à ses fins. || Venir à bien,</span><span class="font29"> réussir : </span><span class="font29" style="font-style:italic;">Ces projets sont venus à bien. || Venir à bout de quelque chose, v. </span><span class="font4" style="font-variant:small-caps;">bout</span><span class="font29"> (§ II, n. 7). || </span><span class="font20" style="font-weight:bold;">2. </span><span class="font29" style="font-style:italic;">Venir à,</span><span class="font29"> ou </span><span class="font29" style="font-style:italic;">en venir à,</span><span class="font29"> arriver à un sujet, à une partie d’un développement : </span><span class="font29" style="font-style:italic;">Venons à notre affaire. Venons-en aux faits. Voilà où je veux en venir. Nous allons en venir au passage le plus important de ce texte. || En venir à, </span><span class="font29">ou (class.) </span><span class="font29" style="font-style:italic;">venir à</span><span class="font29"> (et l’infinitif), se mettre finalement à : </span><span class="font29" style="font-style:italic;">Si nous en venons maintenant à examiner la situation, nous nous apercevons... Il fallut venir malgré moi à agir</span><span class="font29"> (Retz). || 3. </span><span class="font29" style="font-style:italic;">En venir à</span><span class="font29"> (suivi d’un nom ou d’un infinitif), arriver à un certain stade, un certain état extrême après une évolution plus ou moins longue : </span><span class="font29" style="font-style:italic;">Ces fanatiques en sont venus aux injures, puis aux coups. Ils s’apostrophèrent violemment, et en vinrent à se battre. On en est venu à se traiter de tous les noms. || En venir aux mains,</span><span class="font29"> v. </span><span class="font4" style="font-variant:small-caps;">main (a,</span><span class="font29"> § I, n. 4).|| 4. Class. </span><span class="font29" style="font-style:italic;">En venir là que de,</span><span class="font29"> v. </span><span class="font4" style="font-variant:small-caps;">là</span><span class="font29"> (§ I, n. 10). || 5. </span><span class="font29" style="font-style:italic;">Y venir,</span><span class="font29"> en arriver à admettre quelque chose ou à se rallier à quelque chose ; et aussi se résigner, se résoudre à accepter quelque chose : </span><span class="font29" style="font-style:italic;">Un jour il y viendra</span><span class="font29"> (Voltaire). </span><span class="font29" style="font-style:italic;">Vous me parlez de Laforgue. Quel bonheur si vous y venez ! </span><span class="font29">(Gide). </span><span class="font29" style="font-style:italic;">Que cela lui plaise ou non, il faudra bien qu’il y vienne ! || Venir à composition,</span><span class="font29"> v. COMPOSITION.</span></p><p><span class="font20" style="font-weight:bold;">IV. </span><span class="font4" style="font-variant:small-caps;">Emploi comme auxiliaire d’aspect suivi d’un infinitif.</span><span class="font20" style="font-weight:bold;"> 1. </span><span class="font29">Avec l’infinitif seul, marque l’aboutissement du mouvement au terme duquel se fait l’action exprimée par l’infinitif : </span><span class="font29" style="font-style:italic;">Je viens, Monsieur, lui dis-je, vous demander les conseils de votre expérience</span><span class="font29"> (France). </span><span class="font29" style="font-style:italic;">Ceux qui viennent me voir me font honneur, ceux qui ne viennent pas me voir me font plaisir </span><span class="font29">(Claudel). </span><span class="font29" style="font-style:italic;">Alors, que venait maintenant réclamer cette indigente ?</span><span class="font29"> (H. Bazin) ; marque une simple intervention possible ou plus ou moins fortuite : </span><span class="font29" style="font-style:italic;">On viendra objecter que... Aucun contretemps ne venait jamais faire obstacle à ses projets ; </span><span class="font29">s’emploie pour insister simplement sur l’action : </span><span class="font29" style="font-style:italic;">Une obsédante inquiétude venait me torturer. Ne venez pas me dire que vous n’avez rien vu.</span><span class="font20" style="font-weight:bold;"> || 2. </span><span class="font29">Avec la préposition </span><span class="font29" style="font-style:italic;">à</span><span class="font29"> suivie de l’infinitif, marque une éventualité, insiste sur l’idée de hasard fortuit, d’occasion : </span><span class="font29" style="font-style:italic;">Si les vergues venaient à toucher l’eau...</span><span class="font29"> (Vercel). </span><span class="font29" style="font-style:italic;">De peur quelle vienne à me relancer</span><span class="font29"> (Flaubert). </span><span class="font29" style="font-style:italic;">S’il vient à être découvert, il sera fusillé. Vint à passer un régiment...</span><span class="font20" style="font-weight:bold;"> || 3. </span><span class="font29">Avec la préposition </span><span class="font29" style="font-style:italic;">de</span><span class="font29"> suivie de l’infinitif, marque l’achèvement tout récent de l’action exprimée par celui-ci : </span><span class="font29" style="font-style:italic;">Le bruit que cette porte venait de faire en s’ouvrant</span><span class="font29"> (Barbey d’Aurevilly). </span><span class="font29" style="font-style:italic;">Je viens de trouver votre petit mot chez le concierge et me voici à votre service</span><span class="font29"> (Pagnol). </span><span class="font29" style="font-style:italic;">Il vient juste de partir, vous pouvez encore le rattraper !</span></p><p><span class="font4" style="font-variant:small-caps;">♦ Syn.</span><span class="font29"> : 1,</span><span class="font20" style="font-weight:bold;">3 </span><span class="font29" style="font-style:italic;">se rendre ;</span><span class="font20" style="font-weight:bold;"> 8 </span><span class="font29" style="font-style:italic;">se présenter, surgir ; </span><span class="font20" style="font-weight:bold;">9 </span><span class="font29" style="font-style:italic;">provenir ; être issu, sortir ;</span><span class="font20" style="font-weight:bold;"> 10 </span><span class="font29" style="font-style:italic;">découler, résulter; dériver.</span><span class="font20" style="font-weight:bold;"> || II, 1 </span><span class="font29" style="font-style:italic;">apparaître, naître, survenir;</span><span class="font20" style="font-weight:bold;"> 3 </span><span class="font29" style="font-style:italic;">se développer, grandir;</span><span class="font20" style="font-weight:bold;"> 4</span><span class="font29" style="font-style:italic;">pousser ;</span><span class="font20" style="font-weight:bold;"> 5 </span><span class="font29" style="font-style:italic;">intervenir, tomber.</span><span class="font20" style="font-weight:bold;"> || III, 2 </span><span class="font29" style="font-style:italic;">aborder, passer à ;</span><span class="font20" style="font-weight:bold;"> 3 </span><span class="font29" style="font-style:italic;">finir par.</span></p><p><span class="font20" style="font-weight:bold;">♦ s’en venir </span><span class="font29">v. pr. (sens 1, v. 1155, Wace ; sens 2, fin du xm<sup>e</sup> s., Joinville). </span><span class="font20" style="font-weight:bold;">1. </span><span class="font29" style="font-style:italic;">Littér. </span><span class="font29">Syn. vieilli de </span><span class="font4" style="font-variant:small-caps;">venir,</span><span class="font29"> au sens spatial : </span><span class="font29" style="font-style:italic;">Tu t’en viens me traiter de bête carnassière </span><span class="font29">(La Fontaine). </span><span class="font29" style="font-style:italic;">Les fleuves suspendent leurs cours, et [...] la mer à leur rencontre s’en vient tout entière à leurs bouches</span><span class="font29"> (Claudel). </span><span class="font29" style="font-style:italic;">Tout là-bas sur le Rhin s’en vient une nacelle </span><span class="font29">(Apollinaire). || </span><span class="font20" style="font-weight:bold;">2. </span><span class="font29" style="font-style:italic;">Fam.</span><span class="font29"> et </span><span class="font29" style="font-style:italic;">vx.</span><span class="font29"> Retourner au lieu d’où l’on était parti : </span><span class="font29" style="font-style:italic;">Après avoir vagabondé, il s’en vint chez lui tout crotté.</span></p>
Markismus commented 2 years ago

Que is followed by an extra form qu', that was recognized as a new keyword and started a new article. That's now prevented. However, qu' is not indexed as extra form, yet. Wordcount is reduced form 80045 -> 79972.

Markismus commented 2 years ago

Fixed that and the wordcount has increased to 80044.

Markismus commented 2 years ago

@bousnah Could you take a look at the current version at pCloud?