Closed PatrickNLT closed 9 years ago
Thanks for the feedback.
I updated the doc, and added a flag to allow the user to copy the contents of the application bundle to the custom directory or not. I also added a test.
Does that sound good? :)
Done!
Thanks again for your contribution! Merging and closing.
@PatrickNLT, thank you a lot for the feature. I have a few questions and comments about the feature branch.
With this flag we do already have some cases, which return nil tesseract. That's from your test:
tesseract = [[G8Tesseract alloc] initWithLanguage:kG8Languages configDictionary:nil configFileNames:nil absoluteDataPath:customDirectoryPath engineMode:G8OCREngineModeTesseractOnly copyFilesFromResources:NO];
[[tesseract should] beNil];
Why do you think returning a nil in a such case is a good idea? nil data path means tesseract cannot continue to work from the app bundle? We tried to write code in a such way, that tesseract could work with any input parameters. As I already mentioned above, your copyFilesFromResources adds an ambiguity and wrong expectations. If I specify a nil cachesDataPath, tesseract starts and works. If I use your initializer and specifies nil absoluteDataPath, I don't receive tesseract instance at all. So, as I think a new initializer should be rewritten to support a hole concept of this repo. It should do as much work for user as possible and return nil only if there is really nothing to deal with input parameters.
@kevincon, I'm really sorry that I've missed this branch. Unfortunately, I was not watching for the repo notifications. So I thought nothing is happening here. But I've changed the settings to watch for all of them, since I'm really interested to participate in its development.
@ws233 Thanks for the extensive feedback.
Caches
folder) and that they would have to re-download them, which can be quite a pain if they have no/bad service. So unless I missed the option to use any directory as the source for the language files, I needed this. Also, the documentation states:Generally speaking, the application does not require cache data to operate properly
But we do need these files to have a good OCR. So I don't consider language files as cache files.
copyFilesFromResources
parameter was previously inferred from the presence of the cachesRelatedPath
parameter. If cachesRelatedPath
was nil, absoluteDataPath
was set to the path of the resource bundle, so it does not work with the new initializer and I added this parameter to add this information. But we could just check that the absoluteDataPath
is equal to the path of the bundle. Is that something you would prefer?absoluteDataPath
+ copyFilesFromResources = YES
makes no sense, we could assert that it does not happen, but currently we just ignore the boolean parameter.To sum up, I think we could get rid of the boolean parameters as it can be confusing and we could have a default behavior in this case, but the point of this new initializer was to give more flexibility to the developer, so it can also be seen as a nice advanced option. And the tests can of course be updated if you think it's a good idea. :)
Yup, I agree that we need to allow to safe tessdata to the 'AppSupport' folder. That could be necessary in some situations.
Regarding the form of the initializer. Could you explain more details about this advanced option? How it should work? If it's YES, files should be copied and shouldn't otherwise, right? If that's an idea, the current code doesn't satisfy these expectations in all the possible parameters values, that what I worry about. The code should work in every cases exactly like it's declared.
@kevincon, what is your vision? What do you think about a huge number of cases, where initializing tesseract returns nil?
Yes we're on the same page about the behavior of the boolean parameter. But in which case is it not working as intended?
I've mentioned this above. Sorry, if it was not clear. b. not nil absoluteDataPath is specified together with copyFilesFromResources = YES parameter, while absoluteDataPath folder already has tessdata files in place.
In such case, symbolic link are not being recreated, but should be, shouldn't it?
Imagine the case, an app creates the links first. Than the user downloads a new version of some language file and rewrite the link. After some time tesseract starts with copyFilesFromResources = YES, because those downloaded files should be again rewritten with symbolic links back. But that will not happen due to conditions inside the 'moveTessdataToDirectoryIfNecessary' function. That's a line to look at:
if (![fileManager fileExistsAtPath:destinationFileName]) {
OK now I see what you meant, thanks.
Well, I don't think symbolic links should be re-created in this case, because it goes against the logic you previously described for the cache: if the user updates the language files, you don't want to overwrite them. But you may want not to copy the files at all (for example if the user needs different languages than the one you embedded in the bundle), and that's when the option is useful.
I really think it's just a matter of clarifying the doc.
I don't agree here. In such case that's one more ambiguity. From one side you say that YES parameter will copy files, from other side you say, that it will not in some situations. You see, too many different and specific cases. And many people will soon ask questions, why it works in a such way and not in another.
Perhaps, that's just a matter of clarifying the doc, but I think it's better to hide this parameter from the user at all.
Anyway, let's wait for @kevincon.
copyFilesFromResources
should overwrite filesI agree with @PatrickNLT that setting the copyFilesFromResources
parameter to YES should not overwrite any existing language files because I think the use case he describes (where an app has downloaded newer versions of language files) is a common one and the expected behavior is that the library should only copy files from the app bundle that do not already exist in the target directory.
However I can see @ws233's point of view about the parameter being ambiguous since copyFilesFromResources
by itself sounds like it should overwrite any existing files (e.g. the Unix cp
command overwrites by default when copying).
One solution is to just rename copyFilesFromResources
to something like copyFilesFromResourcesWithoutOverwriting
or copyAbsentFilesFromResources
, but I think I agree with both of you that copyFilesFromResources
can be removed altogether (see below).
From a bigger picture point of view, I just think the library should support the following (let me know if you disagree or think more configurations should be supported):
(1) is satisifed by the convenience initializers, (2) is satisfied by the initializer with the cachesRelatedDataPath
parameter, and (3) is satisfied by the designated initializer, so I think the library satisfies these configurations currently.
But I agree that it is unnecessary to have the copyFilesFromResources
parameter in the designated initializer since only (2) involves copying files from the app bundle.
So I agree with the proposal to remove copyFilesFromResources
from the designated initializer and refactor the code so that the copying of the "tessdata" folder in the app bundle occurs exclusively in the initializer with the cachesRelatedDataPath
parameter. How does that sound?
Looks good to me.
Actually with your proposal 2. is not satisfied if we need another path than the cache path (my case precisely). We may want to seed the tessdata folder with maybe one language and the data necessary to use OSD. I would change it into:
remove
copyFilesFromResources
from the designated initializer and refactor the code so that the copying of the "tessdata" folder in the app bundle occurs if a folder (cache or any other location) is specified.
Ah you're right, I forgot cachesRelatedDatapath
roots in the Caches folder. I agree with your modification:
"Remove copyFilesFromResources
from the designated initializer and refactor the code so that the copying of the "tessdata" folder in the app bundle occurs if a folder (cache or any other location) is specified."
And I guess (3) can be satisfied if the developer does not have a "tessdata" folder in the app bundle? I think the code should recognize that case and not try to copy anything from the app bundle to the specified folder.
Great, looks good to me too.
I'll create a new issue for this task.
Having language files stored into the cache directory is rather restrictive as the system may clean this directory from time to time. My initializer allows the developer to store the language files somewhere else (typically in the Application Support directory).