FooSoft / yomichan-import

External dictionary importer for Yomichan.
https://foosoft.net/projects/yomichan-import/
MIT License
82 stars 23 forks source link

Request: Separate Daijirin's J-J and J-E versions #16

Open anonymouse333 opened 6 years ago

anonymouse333 commented 6 years ago

The Daijirin EPWING dictionary comes with both J-J and J-E definitions. Ideally, Yomichan Import should split these into two separate dictionaries so users can choose to add either only the J-J version or only the J-E version to Yomichan.

Alternatively, if the dictionary can't be converted into two separate versions at once, the user should be given the option to strip one version out during the conversion process, leaving them with either only a J-J version or only a J-E version.

rnpnr commented 3 years ago

Here is a hacky diff to do just that. Reverse the condition to get a dictionary containing only the J->E definitions.

Note this does not remove the English only entries but in my experience those aren't the ones that show up when you don't want them to. As far as I know it doesn't remove any entries incorrectly but the diff between the (pretty-printed) jsons is 400K lines long so I didn't look at the whole thing.

diff --git a/daijirin.go b/daijirin.go
index 5983918..46b11a1 100644
--- a/daijirin.go
+++ b/daijirin.go
@@ -29,6 +29,7 @@ import (
 )

 type daijirinExtractor struct {
+   engGlossExp  *regexp.Regexp
    partsExp     *regexp.Regexp
    readGroupExp *regexp.Regexp
    expVarExp    *regexp.Regexp
@@ -39,6 +40,7 @@ type daijirinExtractor struct {

 func makeDaijirinExtractor() epwingExtractor {
    return &daijirinExtractor{
+       engGlossExp:  regexp.MustCompile(`→英和`),
        partsExp:     regexp.MustCompile(`([^(【〖]+)(?:【(.*)】)?(?:〖(.*)〗)?(?:((.*)))?`),
        readGroupExp: regexp.MustCompile(`[-・]+`),
        expVarExp:    regexp.MustCompile(`\(([^\)]*)\)`),
@@ -49,6 +51,10 @@ func makeDaijirinExtractor() epwingExtractor {
 }

 func (e *daijirinExtractor) extractTerms(entry zig.BookEntry, sequence int) []dbTerm {
+   if e.engGlossExp.FindStringIndex(entry.Text) != nil {
+       return nil
+   }
+
    matches := e.partsExp.FindStringSubmatch(entry.Heading)
    if matches == nil {
        return nil