Closed corinabioinformatic closed 5 years ago
So part of this is a confusion of what the word_probability
is for and what it means. The word probability is the ratio of the word compared to the corpus. It allows for us to choose which of two words that are both possible answers should be selected. Generally you will never need to use that function unless you wanted to inspect a words value. You are likely getting 0.0 since those words are not in the result set. That could be a bug that should be resolved (likely throw an exception).
As for your other questions:
I hope this is helpful!
Thank you very much Barrust. I will take a look to how the programme works in deeper and the link to the documentation on building a new dictionary. Very interesting!
Hi, I am not sure how to use word_probability(word) function. I am currently using pyspellchecker to complete a list of mispelled words. But It gives me an output of 0. And what I need is a list of probability per each candidate in the word list printed before. Here the code:
Why Am I doing that? In the code you can see that 'diabet' word returns 'diet' instead of 'diabetes'.
I would like to find an accurate correction related to my topic. As far as I know my options are :
1) Passing "distance =1" argument in the 'correction' function-> does not correct the problem with 'diabet' word.
2) Providing a text file dictionary with all the words of my interest as you suggested here (load_text_file. Question, what is the expected format for this txt file? Could you share a example? )
3) Adding a new function to correct the algorithm based in the terminology I am using (Health related terminology) , by mean of adding a new argument (topic = "Health") and therefore biasing the spell corrections to all the related terminology to that topic. Are you already developing anything like that in the module?
Please could you give me a clue about how to do this (2 & 3 questions)? Many thanks!
UPDATE I am using the txt file of medical terms provided by @glutanimate & @dgreuel here. I think it solved partially the issue for my purposes.