Open mMerlin opened 8 years ago
Mr. Duby : THANK YOU!
This is rather embarrassing to me, but I am very glad you have pointed this out. I was unsure of a good way to do it on my LaTeX originals; I will now start copying your method and will make these changes immediately. Thank you! I will acknowledge you on the website soon. I feel very lucky that you took the time to do this!
Ah LaTeX. I did not look at the other files in the doc folder. I am only marginally familiar with the raw content structure, but had a quick look. It might be possible to feed the latex file directly to hunspell. Possibly including a custom dictionary to ignore some of the latex commands and keywords. And another dictionary that you add the glussbot contextual terms to that are not in the standard English dictionary. Something like:
hunspell -d en_US,latex,en_glussbot «latex_file»
Another option is to find a printer definition that will output a standard text (ascii, or utf-8) file, and run hunspell against the output of that.
If neither of those work well, or are practical, another option would be make the process I used more automated. There are various programs available that will extract text content from a pdf file. I just did a select all, copy from the viewer program, and pasted to a text file. Google for "extract text from pdf", maybe adding keywords for your operating system. I have used a windows program for that before, but am using Linux now, and have "pdftotext" which is directly available on many distros. See http://www.cyberciti.biz/faq/converter-pdf-files-to-text-format-command/
With that reminder, I just downloaded a fresh copy of Gluss.pdf, and did:
pdftotext Gluss.pdf gluss.txt
hunspell -d en_US gluss.txt
That worked much better than the cut and paste I did previously. Though still need to merge the corrections back in to the latex file. Though using hunspell to fix some of the corrections, then doing a diff to the original extract file might make the work easier.
I mentioned that I was only marginally familiar with LaTeX. So I had never lookup up doing spell checking on the files. Checking man hunspell for something else, I noticed:
-t The input file is in TeX or LaTeX format.
So:
hunspell -t Gluss.tex
is the way to do this cleanly. Optional with -d «dictionary», The dictionary only needs to be specified if you want to use something that is not the default for your locale. Like I did for the original check that started this thread. My default is en_CA, so I had to specify en_US to get checking that (seemed) to match the document content. There are still a lot of things not passing the spell checker. Some, but not all, are valid words that are not in the dictionary.
hunspell -t -l -d en_US Gluss.tex | sort | uniq
(after wrapping, so this is not one word per line here) gives:
12V 3D 3Tet 3TetGlussBot 3x2 400mm 5TetGlussBot Actuonix anisotropy Arduino Boerdijk broadwalk Buckminster clays CMS coformability compressive consiting Coxeter cuboctahedron Cuevas customizer embodiments emodiment Expr Firgelli Geomag Gluss gluss GlussBot glussbot glussbots glusses glussion Gnomon Hirose Instructable js JSON Kwon libre Lipson locomote Mechatronics MegaShield microcontroller modelling mollusc multi NTetGlussBot octahedral octahedron OpenSCAD OshPark performant PLA pseudopod pseudopods pushrod Sanderson Sanderson's scalable Shigoe snakebot snakebots stator Tensegrities Tensegrity tensegrity tet TetGlussBot tetrahedra Tetrahelix tetrahelix TETROBOT tremoendous trianngular trinagular unweighting v0 Valero vanishingly workperson
After verifying, add the specialty words to your private dictionary, to streamline any future checks
Thanks again. I had some trouble installing the dictionary, but I have it working now, thank you.
On Thu, Sep 15, 2016 at 5:49 PM, Phil Duby notifications@github.com wrote:
I mentioned that I was only marginally familiar with LaTeX. So I had never lookup up doing spell checking on the files. Checking man hunspell for something else, I noticed:
-t The input file is in TeX or LaTeX format.
So:
hunspell -t Gluss.tex is the way to do this cleanly. Optional with -d «dictionary», The dictionary only needs to be specified if you want to use something that is not the default for your locale. Like I did for the original check that started this thread. My default is en_CA, so I had to specify en_US to get checking that (seemed) to match the document content. There are still a lot of things not passing the spell checker. Some, but not all, are valid words that are not in the dictionary.
hunspell -t -l -d en_US Gluss.tex | sort | uniq
(after wrapping, so this is not one word per line here) gives:
12V 3D 3Tet 3TetGlussBot 3x2 400mm 5TetGlussBot Actuonix anisotropy Arduino Boerdijk broadwalk Buckminster clays CMS coformability compressive consiting Coxeter cuboctahedron Cuevas customizer embodiments emodiment Expr Firgelli Geomag Gluss gluss GlussBot glussbot glussbots glusses glussion Gnomon Hirose Instructable js JSON Kwon libre Lipson locomote Mechatronics MegaShield microcontroller modelling mollusc multi NTetGlussBot octahedral octahedron OpenSCAD OshPark performant PLA pseudopod pseudopods pushrod Sanderson Sanderson's scalable Shigoe snakebot snakebots stator Tensegrities Tensegrity tensegrity tet TetGlussBot tetrahedra Tetrahelix tetrahelix TETROBOT tremoendous trianngular trinagular unweighting v0 Valero vanishingly workperson
After verifying, add the specialty words to your private dictionary, to streamline any future checks
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PubInv/gluss/issues/9#issuecomment-247476707, or mute the thread https://github.com/notifications/unsubscribe-auth/AFDSH0uuKeW5WOjsQd2P_ljjT0L_CKJUks5qqcuXgaJpZM4J2MVh .
Robert L. Read, PhD Twitter: @RobertLeeRead The Gluss Project: https://pubinv.github.io/gluss/ YouTube: https://www.youtube.com/channel/UCJQg_dkDY3KTP1ybugYwReg Blog: http://publicinvention.blogspot.com
Really need to run a spell checker against Gloss.pdf. Or the document that was used to generate it.
A couple of non-spelling typos:
I think should be:
Also:
is better as:
I noticed a couple of typos when if started reading, so grabbed what raw text I could easily, and ran it through
hunsell -d en_US
This will not be complete, due to my copy paste not really matching the document, and hyphenation messing up my check.recently invented sperical joint ==> recently invented spherical joint truss-like apporach to providing ==> truss-like approach to providing are possibile with the same material ==> are possible with the same material It moves like a mollusc or amoeba ==> It moves like a mollusk or amoeba is a portmaneau of ==> is a portmanteau of be utlized by an ideal turret ==> be utilized by an ideal turret using a trinagular section ==> using a triangular section nylon trinagular rotor ==> nylon triangular rotor green part is a Tetrahlix lock ==> green part is a Tetrahelix lock Octet Trusss geometry ==> Octet Truss geometry capable of acheiving ==> capable of achieving configure acutators and joints ==> configure actuators and joints both the tetrahexlix and ==> both the tetrahelix and reflects the geomtry called ==> reflects the geometry called not support infinte revolution ==> not support infinite revolution diamter steel ball ==> diameter steel ball achieves the intetion of making ==> achieves the intention of making The comptuer and the ==> The computer and the is very analagous to ==> is very analogous to An Arudino Mega Shield ==> An Arduino Mega Shield capable of locomtion ==> capable of locomotion handle rugose terrain ==> handle rugged terrain the 3TestGlussBot gait ==> the 3TetGlussBot gait A quadripedal robot ==> A quadrupedal robot book by Shigoe Hirose[9] substituing ==> book by Shigoe Hirose[9] substituting distinction betwen the two ==> distinction between the two at $80 make construcitng ==> at $80 make constructing Applications: (Philosphy, Design ==> Applications: (Philosophy, Design would be analagous to analysis ==> would be analogous to analysis Finite elment analysis ==> Finite element analysis are dissassembled so that ==> are disassembled so that linear actators can be ==> linear actuators can be uS Patent ==> US Patent (multiple occurrences) (Delta Defintition) ==> (Delta Definition) we have constaints ==> we have constraints all of thse constaints ==> all of ~these~ ~constraints~ a mononotonically decreasing ==> a monotonically decreasing set of transofrmations ==> set of transformations to rewrite out constaint as ==> to rewrite ~our~ ~constraint~ as rotors physcially bumping ==> rotors physically bumping reader is familar with ==> reader is familiar with cannot use Delta Defintion ==> cannot use Delta Definition acute trinagle we assert == acute triangle we assert to be rigourously shown ==> to be rigorously shown algebraic simplifcation we obtain ==> algebraic simplification we obtain Cross mulitplying and simplifying ==> Cross multiplying and simplifying Substuting back into ==> Substituting back into same algebaric result ==> same algebraic result