JMSLab / LaroplanOCR

Swedish primary school curricula (Läroplaner för grundskolan) in digital format.
MIT License
2 stars 0 forks source link

Test repository #6

Closed santiagohermo closed 2 years ago

santiagohermo commented 2 years ago

The goal of this issue is to make sure that the code works, the requirements are up-to-date, and the explanations on how to use it are clear. For that we'll get an external reviewer (who hasn't used the repository before) to

1) Install requirements and run the main code, following the explanation in the main README, 2) Run example following steps in the README in the /example/ folder.

The review should not include going through the code in detail, as this was already done in #2.

The deliverable will be a set of recommendations for improvements of those things that are unclear or incorrect.

santiagohermo commented 2 years ago

fyi @jmshapir @miikapaal

jmshapir commented 2 years ago

@rcalvo12 welcome to LaroplanOCR! We've brought you here in the hopes that you can take a fresh look at this repository and help us with some testing. More details in https://github.com/JMSLab/LaroplanOCR/issues/6#issue-1122005421. Thanks!

rcalvo12 commented 2 years ago

I have completed my review of the requirements and run of the main code as well as my review of the example. Overall, it was very clear but here are two items to consider for the main code:

From README.md (root) Item 1:

(If using conda, run conda install --file requirements.txt.) (Line 76)

Installing requirements with this line doesn't work automatically for pytesseract, layoutparser, and pdf2image:

Recommendation: Either just recommend pip install -r requirements.txt or modify requirements so that issues with conda installation don’t occur.

Item 2:

  1. Make sure that all the required program executables are in your systems path. (Line 78)

“Required program executables” are not explicitly stated or listed anywhere so the reader doesn’t know what they are checking for. While it can be pretty easily inferred (if I could infer it, I think most could) if something did go wrong for someone they might struggle to figure out what happened.

Recommendation: Explicitly list what executables are required so that reader can double check for themselves without having to guess.

///

I have no comments on the example. It ran smoothly and the instructions were clear.

@santiagohermo and @jmshapir would it be useful for me to open a branch and push my logs / changed files?

jmshapir commented 2 years ago

Thanks @rcalvo12!

@santiagohermo and @jmshapir would it be useful for me to open a branch and push my logs / changed files?

I defer to @santiagohermo on this point, thanks!

santiagohermo commented 2 years ago

Thanks @rcalvo12! Great to hear that you could run everything without major issues. Your recommendations on the README make sense to me, and I think it would be useful to take a look at how the files changed in your compilation. Could you open a branch, commit the changes, and implement the suggestions in the README? I think we need @jmshapir to give you write access.

Once you are done we can open a PR with me as a reviewer.

jmshapir commented 2 years ago

Thanks @rcalvo12! Great to hear that you could run everything without major issues. Your recommendations on the README make sense to me, and I think it would be useful to take a look at how the files changed in your compilation. Could you open a branch, commit the changes, and implement the suggestions in the README? I think we need @jmshapir to give you write access.

Once you are done we can open a PR with me as a reviewer.

Write access enabled!

rcalvo12 commented 2 years ago

Created issue branch issue6_test_repo for work on this issue.

rcalvo12 commented 2 years ago

@santiagohermo, Results from the run were pushed in 6fa343b.

Can you confirm which executables are required?

santiagohermo commented 2 years ago

@santiagohermo, Results from the run were pushed in 6fa343b.

Can you confirm which executables are required?

Sure! On second thoughts, it's only the Tesseract-OCR installation folder that needs to be in the path. So, because we already ask people to add it to the path in 'Prerequisites', I would say that we drop point 3) of the 'Quick start' section of the README altogether.

The other thing would be to drop the suggestion to install requirements using conda.


Thanks for adding the results! Two thoughts:

rcalvo12 commented 2 years ago

On second thoughts, it's only the Tesseract-OCR installation folder that needs to be in the path. So, because we already ask people to add it to the path in 'Prerequisites', I would say that we drop point 3) of the 'Quick start' section of the README altogether.

@santiagohermo, thanks for that clarification, that makes sense to me. I will be pushing the edited README.md shortly.

  • Changes seem to be mostly on strings that happen once. I think this is to be expected given the machine differences.
  • There are some differences in the text file that look like nothing changed, eg here. However, before there was before a weird invisible symbol that now disappeared. I was getting rid of the symbol here. I wonder why this happened. Are you running the OCR on a Mac?

That is correct, I ran the OCR on a Mac.

  • It looks like your computer produced some .DS_Store files, like this one. Can you add this file extension to the gitignore and delete them?

Will do this now!

  • Finally, I noticed no change in the R figure. Did you get a change to run it? Just want to make sure that this part also works

Yes, I was able to run the /example/build.R, and the png was produced. Not sure why this was not tracked, but it definitely worked (I just double checked).

rcalvo12 commented 2 years ago

Ok @santiagohermo, all the changes we have discussed so far have been pushed.

santiagohermo commented 2 years ago

Thanks @rcalvo12! Things look good. We can move to PR so that I go through the changes quickly once again

rcalvo12 commented 2 years ago

Thread continues in its PR #7.

rcalvo12 commented 2 years ago

Summary: In this issue @rcalo12 tested the repository without any prior knowledge of it to test that the code ran smoothly and that instructions were clear. Main changes were edits made to the README.md for the sake of clarity outlined here and here and implemented 24d58f6.

Branch merged into master in 8c0136a.

Final state of the issue branch can be found here.