FeatureBaseDB / featurebase

A crazy fast analytical database, built on bitmaps. Perfect for ML applications. Learn more at: http://docs.featurebase.com/. Start a Docker instance: https://hub.docker.com/r/featurebasedb/featurebase
https://www.featurebase.com
Apache License 2.0
2.53k stars 233 forks source link

Failure to import tutorial data #719

Closed gnuvince closed 7 years ago

gnuvince commented 7 years ago

For bugs, please provide the following:

Expected behavior

The pilosa import command give in the tutorial (https://www.pilosa.com/docs/getting-started/#sample-project) completes correctly.

Actual behavior

The import command fails to complete and gives an error message saying that there are too many files opened. Attempting to re-execute the command gives the same error. lsof | grep stargazer | wc -l reports 20,280 files opened by Pilosa. Afterwards, Pilosa fails to start with the same error message.

$ ./pilosa import -i repository -f stargazer stargazer.csv

[ elided ]
2017/07/10 14:55:53 fragment: snapshotting repository/stargazer/inverse_20150823/0
2017/07/10 14:55:53 fragment: snapshot complete repository/stargazer/inverse_20150823/0 took 15.461138ms
2017/07/10 14:55:53 fragment: error flushing cache on close: err=open /home/vfoley/.pilosa/repository/stargazer/views/standard_20150825/fragments/0.cache: too many open files, path=/home/vfoley/.pilosa/repository/stargazer/views/standard_20150825/fragments/0
2017/07/10 14:55:53 import error: index=repository, frame=stargazer, slice=0, bits=50000, err=open cache: open /home/vfoley/.pilosa/repository/stargazer/views/standard_20150825/fragments/0.cache: too many open files

~$ lsof | grep stargazer | wc -l
20280

$ ./pilosa server
Pilosa v0.4.0, build time 2017-06-08T15:46:41+0000
Using data from: /home/vfoley/.pilosa
2017/07/10 15:04:03 opening index: repository
2017/07/10 15:04:03 fragment: error flushing cache on close: err=open /home/vfoley/.pilosa/repository/stargazer/views/inverse_20160314/fragments/0.cache: too many open files, path=/home/vfoley/.pilosa/repository/stargazer/views/inverse_20160314/fragments/0
Error: error running server: server.Open: opening Holder: open index: name=repository, err=open frame: name=stargazer, err=open view: view=inverse_20160314, err=open fragment: slice=%!s(uint64=0), err=open cache: open /home/vfoley/.pilosa/repository/stargazer/views/inverse_20160314/fragments/0.cache: too many open files
Usage:
  pilosa server [flags]

Flags:
      --anti-entropy.interval duration       Interval at which to run anti-entropy routine. (default 10m0s)
  -b, --bind string                          Default URI on which pilosa should listen. (default ":10101")
      --cluster.gossip-seed string           Host with which to seed the gossip membership.
      --cluster.hosts stringSlice            Comma separated list of hosts in cluster.
      --cluster.internal-hosts stringSlice   Comma separated list of hosts in cluster used for internal communication.
      --cluster.internal-port string         Port to which pilosa should bind for internal state sharing.
      --cluster.poll-interval duration       Polling interval for cluster. (default 1m0s)
      --cluster.replicas int                 Number of hosts each piece of data should be stored on. (default 1)
      --cluster.type string                  Determine how the cluster handles membership and state sharing. Choose from [static, http, gossip] (default "static")
  -d, --data-dir string                      Directory to store pilosa data files. (default "~/.pilosa")
      --log-path string                      Log path
      --max-writes-per-request int           Number of write commands per request. (default 5000)
      --metric.host string                   Default URI to send metrics.
      --metric.poll-interval duration        Polling interval metrics.
      --metric.service string                Default URI on which pilosa should listen. (default "nop")
      --plugins.path string                  Path to plugin directory.
      --profile.cpu string                   Where to store CPU profile.
      --profile.cpu-time duration            CPU profile duration. (default 30s)

Global Flags:
  -c, --config string   Configuration file to read from.

error running server: server.Open: opening Holder: open index: name=repository, err=open frame: name=stargazer, err=open view: view=inverse_20160314, err=open fragment: slice=%!s(uint64=0), err=open cache: open /home/vfoley/.pilosa/repository/stargazer/views/inverse_20160314/fragments/0.cache: too many open files

Steps to reproduce the behavior

Install Pilosa v0.4.0 and follow the tutorial.

Information about your environment (OS/architecture, CPU, RAM, cluster/solo, configuration, etc.)

jaffee commented 7 years ago

Thanks for the report @gnuvince - we've got a note in the docs about setting open file limits, and if you run ulimit -n 262144 in the terminal right before you run Pilosa, that will probably solve your immediate issue, but this keeps coming up, and I think I have a better solution.

We can use Go's syscall package like so:

    lim := &syscall.Rlimit{}
    syscall.Getrlimit(syscall.RLIMIT_NOFILE, lim)
    fmt.Printf("lim: %v\n", lim)
    err := syscall.Setrlimit(syscall.RLIMIT_NOFILE, &syscall.Rlimit{Cur: 10000, Max: 10000})
    fmt.Println("sycall.Setrlimit: ", err)
    lim = &syscall.Rlimit{}
    syscall.Getrlimit(syscall.RLIMIT_NOFILE, lim)
    fmt.Printf("lim: %v\n", lim)

to set open file limits from within Pilosa. This will have to be tested on multiple platforms since syscall is OS specific, although the above works on my mac. If we can't set the limit very high, we should log a warning so that the user knows she'll need to get privileges to set the limit higher or whatever.

jaffee commented 7 years ago

I created https://github.com/pilosa/pilosa/issues/722 for implementation of the long term fix. @gnuvince if the ulimit fix I described works for you, please go ahead and close this ticket - if not, let us know and we'll dig in further.

gnuvince commented 7 years ago

Yep, the ulimit workaround fixes the immediate issue, thanks!

ggicci commented 7 years ago

Use ulimit -a to list the current configurations. And under some versions of OS X ulimit -n 262144 just produces error: cannot modify limit: Invalid argument. Find a solution here: How to persist ulimit settings in OSX Mavericks?.