GatorEducator / gatorgrader

:heavy_check_mark: Automated Grading Tool that Checks the Work of Writers and Programmers
GNU General Public License v3.0
79 stars 26 forks source link

Support Spellchecking of Technical Writing #174

Open gkapfham opened 5 years ago

gkapfham commented 5 years ago

Is your feature request related to a problem? Please describe.

The current version of GatorGrader does not have a way to perform spellchecking on the submitted technical writing. While it might be possible to leverage existing spellcheckers for different Unix-like platforms, it is probably a good idea to implement these checks in a platform independent fashion. If technical writing contains spelling mistakes that can be reliably detected, then this is a suitable circumstance in which we can fail a build.

Describe the solution you'd like

It should be possible to specify a file in a directory and then have a new option called --spellcheck, using the current approach to the command-line arguments. This would then trigger a reliable, Python-based spellchecker. Here is an example of a project to consider: https://pypi.org/project/pyspellchecker/.

Let me know what you think, @corlettim and @schultzh and @Michionlion. If you like this idea, then I can work with one of you to implement it or try to implement it on my own. With that said, I think that it might be advisable to wait to implement this until after we have introduced the linting interface.

Michionlion commented 5 years ago

I also think this can probably wait, but could be a good check for the future! However, we should ensure that there is some way to ignore the output, because sometimes spell checkers will, for instance, fail on someone's name or other non-checkable situation.

schultzh commented 5 years ago

This sounds like a great check to implement in the future! It would certainly encourage well-written and professional technical writing from students.

gkapfham commented 5 years ago

Hi @Michionlion and @schultzh, thanks for your feedback. You have both made good points. If there is time after implementing the linting interface, then I will also try to add this feature. As of now, this feature is also open for others to implement!

Jordan-A commented 4 years ago

Hello everyone!

I'm new to working on the GatorGrader tool but I'm interested in implementing this suggested feature! One of several suggested features @gkapfham informed me about this topic is to have the ability to set a certain threshold of incorrectly spelled words to compensate against the tool incorrectly detecting student's names/source code words as incorrect. Additionally there are also additional potential concerns that come with implementing a spell checking feature like detecting "garbage words".

What suggestions does everyone have that would be helpful for the implementation of this feature? Additionally are there any use cases or problems that haven't been mentioned previously that the spellchecking feature should be able to handle? You can find future work on this feature in the new branch issue-174-spellchecker.

gkapfham commented 3 years ago

Hello @Jordan-A, do you have any updates on your progress towards completing this feature? Please let us know what steps you are currently considering and how I and other members of the team can help you complete this feature. If you have a timeline by which you think that you can complete an implementation of this feature, then please share those details as well! Thanks!

Jordan-A commented 3 years ago

Hi @gkapfham! Currently I have implemented the initial version of the check_Spelling.py file that defines the command line arguments like the file, directory, and an optional ignore argument to ignore a certain amount of misspelled words. I would commit and push the work I currently have but I'm facing an error that prevents the program from running successfully when running the pipenv run python3 gatorgrader.py Spelling --file ... --directory ... --ignore ... command. The error is TypeError: 'NoneType' object is not subscriptable on line 64 of the report.md file. I think this is due to the lack of "supporting" functions in the invoke.py file and a standalone spellcheck.py file that currently don't read in the input, perform the spellchecking, and return the output of that process. I would like to spend at most one more day to solve this problem before I'll push so that it can be reviewed.

I think it would be great to be able to get initial advice on some of the mistakes that I have made in trying to implement the spellcheck CLI once I have pushed my work. Additionally it would be great to get feedback on the type of implementation that should be used in the final iteration of this feature. Specifically should we use a tool like symspell or should I try to develop a machine learning approach to spell checking? I have a feeling that the answer is it depends on the amount of time left.

Ideally, I would like to finish this feature in the next two weeks or so. I do think it's feasible to have this ready to be reviewed in a pull request in late December or early January.

Thank you!