CLD2Owners / cld2

Compact Language Detector 2
Apache License 2.0
840 stars 128 forks source link

Post-dynamic-mode cleanup #7

Closed ghost closed 9 years ago

ghost commented 9 years ago

Originally reported on Google Code with ID 7

There are a few things to clean up in r151:
* Use the newly-added constants in the table classes to avoid hardcoding sizes
* Ensure cld2_generated_quadchrome0122_16.cc works with both active tables in dynamic
mode
* Add the ability to use an already-extant mmap to load the data from (rather than
managing the mmap directly). This is necessary for systems (such as Chromium) where
the security model forbids direct access to the filesystem in some contexts where CLD2
might be used

Should all be pretty straightforward. Remove all FIXME and TODO comments added by andrewhayden@google.com
as well.

Reported by andrewhayden@google.com on 2014-03-03 15:25:27

ghost commented 9 years ago
Manually verified that all unit tests pass with cld2_generated_quadchrome0122_16.cc.

Reported by andrewhayden@google.com on 2014-03-03 15:32:23

ghost commented 9 years ago
Also TBD: Null-out all kScoringtables members in the call to unloadData.

Reported by andrewhayden@google.com on 2014-03-03 15:42:49

ghost commented 9 years ago
Attaching a patch for dealing with the data in an externally-managed mmap. Unit tests
have been updated to re-run in mmap-based dynamic mode as well as file-based dynamic
mode, and all pass.

Reported by andrewhayden@google.com on 2014-03-03 17:10:13


ghost commented 9 years ago
Patch committed as r153

Reported by andrewhayden@google.com on 2014-03-04 10:30:43

ghost commented 9 years ago
The patch takes care of nulling out the fields in kScoringtables.

Still TODO:
* Use the newly-added constants in the table classes to avoid hardcoding sizes
* Remove all FIXME and TODO comments added by andrewhayden@google.com as well

Reported by andrewhayden@google.com on 2014-03-04 10:33:16

ghost commented 9 years ago
Here we go, this patch switches over to use the constants. I also added the constants
into the 0122 files that were missing them, and I've compiled and run all tests in
all the shell scripts. Everything works, and the data files are perfect binary matches
before and after the patch. This looks correct to me, and since all tests are passing
I'm going to go ahead and submit (no functional changes here for CLD2).

There are still a few FIXMEs but they revolve around different things, like not relying
on bitpacking of underlying structures. This isn't a problem right now, and we can
push it off indefinitely (possibly forever).

Reported by andrewhayden@google.com on 2014-03-11 12:40:58


ghost commented 9 years ago
Patch committed as r154

Reported by andrewhayden@google.com on 2014-03-11 12:42:56