gnames / gnfinder

GNfinder finds scientific names in UTF8 texts, PDF files, MS Word/Excel documents, URLs etc.
MIT License
44 stars 5 forks source link

Black / grey dictionaries #43

Closed Adafede closed 4 years ago

Adafede commented 4 years ago

Hi,

First of all, thank you for your wonderful work!

I am using the binary executable and I was wondering if it was possible to "bypass" the black/grey dictionaries, or to modify them, or to allow "piper", as an example.

Thank you very much in advance.

P.S.: I'm using a MacOS device and I wasn't able to build it from scratch using make deps

dimus commented 4 years ago

If you are able to build from scratch, yes, you can modify the dictionaries. Your changes will go to the binary. How about make install? Does it work for you?

dimus commented 4 years ago

Did not get feedback, closing until I get any.

Adafede commented 4 years ago

My apologies, indeed did not answer...

I didnt try building from scratch...not familiar with GO...I'll try and let you know.

Sorry again

dimus commented 4 years ago

After you install Go 1.14 and set GOPATH environment variable you should be able to clone the gnfinder repo, cd to it and run make or make install (they both do the same thing)

Adafede commented 4 years ago

Hi, so I both did:

1) Installed Go and followed the steps you have in the README file (go get github.com/gnames/gnfinder cd $GOPATH/src/github.com/gnames/gnfinder make install) -> when you do get github.com/gnames/gnfinder, you obtain an archive file you must uncompress first. But then, when uncompressing it you only get two files __.PKGDEF and _go_.o...I don't think it is what I should get. So I simply did clone the repo, cded to it like you mentioned but neither make nor make install work:

cd protob; \ protoc -I . ./protob.proto --go_out=plugins=grpc:.; /bin/sh: protoc: command not found make: *** [grpc] Error 127

any ideas? Thanks a lot for your help!

dimus commented 4 years ago

try to execute this script to install protoc https://github.com/gnames/gnfinder/blob/master/scripts/protoc-install.sh Although may be protoc exists as a brew package on Mac

dimus commented 4 years ago

on Mac you can use

brew install protobuf
Adafede commented 4 years ago

Hi again,

Thank you very much for your help.

I tried what you suggested but kept being stuck at the same point...

cd protob; \ protoc -I . ./protob.proto --go_out=plugins=grpc:.; /bin/sh: protoc: command not found make: *** [grpc] Error 127

By searching a bit I found something that did the trick (at least for me) here. After it...it ran...was happy...and...

cd protob; \
    protoc -I . ./protob.proto --go_out=plugins=grpc:.;
2020/05/10 18:12:31 WARNING: Missing 'go_package' option in "protob.proto", please specify:
    option go_package = ".;protob";
A future release of protoc-gen-go will require this be specified.
See https://developers.google.com/protocol-buffers/docs/reference/go-generated#package for more information.

cd fs; \
    GO111MODULE=on CGO_ENABLED=0 GOARCH=amd64 go run -tags=dev assets_gen.go
writing files_vfsdata.go
go generate
cd gnfinder; \
    GO111MODULE=on CGO_ENABLED=0 GOARCH=amd64 go install -ldflags "-X github.com/gnames/gnfinder.Build=2020-05-10_16:12:33UTC -X github.com/gnames/gnfinder.Version=v0.10.1";
# github.com/gnames/gnfinder/protob
../protob/protob.pb.go:1157:7: undefined: grpc.ClientConnInterface
../protob/protob.pb.go:1161:11: undefined: grpc.SupportPackageIsVersion6
../protob/protob.pb.go:1173:5: undefined: grpc.ClientConnInterface
../protob/protob.pb.go:1176:27: undefined: grpc.ClientConnInterface
make: *** [install] Error 2

I tried figuring how to deal with it by myself for a while but sadly couldn't find any solution...sorry to bother you again with annoying technical issues.

dimus commented 4 years ago

lets try a different approach

  1. cd to the gnfinder directory. It should have another gnfinder directory in it, then cd to that second gnfinder directory.
  2. cd gnfinder
  3. go build
Adafede commented 4 years ago

# github.com/gnames/gnfinder/protob ../protob/protob.pb.go:1157:7: undefined: grpc.ClientConnInterface ../protob/protob.pb.go:1161:11: undefined: grpc.SupportPackageIsVersion6 ../protob/protob.pb.go:1173:5: undefined: grpc.ClientConnInterface ../protob/protob.pb.go:1176:27: undefined: grpc.ClientConnInterface

dimus commented 4 years ago

I made a new commit with all dependencies upgraded. I suspect it should fix your problem.

https://github.com/gnames/gnfinder/commit/af357f2c17361de767533a9b0cff54adc1fbe21a

Try now

git pull 
make
Adafede commented 4 years ago
(base) Adriano:gnfinder rutza$ cd gnfinder
(base) Adriano:gnfinder rutza$ go build
go: downloading github.com/spf13/viper v1.7.0
go: downloading github.com/spf13/cobra v1.0.0
go: downloading gitlab.com/gogna/gnparser v0.14.1
go: downloading github.com/pkg/errors v0.9.1
go: downloading github.com/json-iterator/go v1.1.9
go: downloading google.golang.org/grpc v1.29.1
go: downloading github.com/abadojack/whatlanggo v1.0.1
go: downloading github.com/spf13/pflag v1.0.5
go: downloading github.com/mitchellh/mapstructure v1.3.0
go: downloading github.com/subosito/gotenv v1.2.0
go: downloading gopkg.in/yaml.v2 v2.2.8
go: downloading github.com/magiconair/properties v1.8.1
go: downloading github.com/spf13/afero v1.2.2
go: downloading github.com/spf13/cast v1.3.1
go: downloading gopkg.in/ini.v1 v1.56.0
go: downloading github.com/pelletier/go-toml v1.7.0
go: downloading github.com/fsnotify/fsnotify v1.4.9
go: downloading golang.org/x/net v0.0.0-20200506145744-7e3656a0809f
go: downloading github.com/spf13/jwalterweatherman v1.1.0
go: downloading golang.org/x/sys v0.0.0-20200509044756-6aff5f38e54f
go: downloading github.com/gnames/uuid5 v0.1.1
go: downloading golang.org/x/text v0.3.2
go: downloading google.golang.org/genproto v0.0.0-20200507105951-43844f6eee31
go: downloading github.com/satori/go.uuid v1.2.0
(base) Adriano:gnfinder rutza$ 

Looks like it worked <3

dimus commented 4 years ago

I updated documentation of Development section https://github.com/gnames/gnfinder#development using the link you found. Closing this ticket

Adafede commented 4 years ago

Hi,

Wonderful for the doc! Thank you for all your hard work and the nice last versions you developed. Now that I am able to build from scratch I had the opportunity to have a more careful look at the dictionaries and related files.

I therefore have a question: Is there a precise reason why you chose to put piper in your black uninomials dictionary?

I am asking you this because I am interested in this species and can not afford loosing it during text recognition. Therefore, since I was not able to build from scratch till now I used the R taxize package which, in a certain way, allows to find more results than the ones obtained with GNFinder, although they use your tool! I think this is because they removed some entries of your black dictionaries but I'm not sure about it. @sckott

I could build my own version on my side and modify accordingly (commenting piper for example) but I think it is more useful to discuss this point publicly and maybe find common solutions in order to avoid everyone having its own version.

If you are interested in it, I already retrieved manually some problematic entries that are found in some biological databases and lead to aberrant results, such as (in alphabetical order):

japanese yew (Taxus cuspidata) anaerobic (is already in your list, but probably taxize ignores it) candidatus (same as anaerobic) chinensis green (same as anaerobic) megaleia ootheca peripatoides red (same as anaerobic) sinensis tasmanian uncultured

Many thanks again

dimus commented 4 years ago

I am making a new ticket for dictionaries