dedupeio / dedupe-examples

:id: Examples for using the dedupe library
MIT License
406 stars 214 forks source link

Segmentation Fault #21

Closed paulshannon closed 9 years ago

paulshannon commented 9 years ago

I'm just trying to get this up and running in a VM (using vagrant) and am running into a Segmentation Fault. Any ideas what might be causing this?

Vagrantfile:

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.box = "ubuntu/trusty64"
  config.vm.provision "shell", path: "provision.sh", privileged: false
  config.vm.provider "virtualbox" do |v|
    v.memory = 1024
  end
end

provision.sh:

sudo apt-get update
sudo apt-get install -q -y git libpq-dev python-dev python-virtualenv

cd ~
git clone https://github.com/datamade/dedupe-examples.git
virtualenv --no-site-packages venv

. venv/bin/activate
pip install numpy
pip install dedupe
pip install psycopg2 dj-database-url unidecode

Shell session:

vagrant@vagrant-ubuntu-trusty-64:~$ source venv/bin/activate
(venv)vagrant@vagrant-ubuntu-trusty-64:~$ cd dedupe-examples/csv_example/
(venv)vagrant@vagrant-ubuntu-trusty-64:~/dedupe-examples/csv_example$ python csv_example.py -vv
importing data ...
/home/vagrant/venv/local/lib/python2.7/site-packages/dedupe/sampling.py:35: UserWarning: 75000 blocked samples were requested, but only able to sample 74750
  % (sample_size, len(blocked_sample)))
Segmentation fault
(venv)vagrant@vagrant-ubuntu-trusty-64:~/dedupe-examples/csv_example$ 
fgregg commented 9 years ago

I suspect it's numpy segfaulting because it's run out of memory.

On Mon Jan 19 2015 at 1:48:15 PM Paul Shannon notifications@github.com wrote:

I'm just trying to get this up and running in a VM (using vagrant) and am running into a Segmentation Fault. Any ideas what might be causing this?

Vagrantfile:

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config| config.vm.box = "ubuntu/trusty64" config.vm.provision "shell", path: "provision.sh", privileged: false config.vm.provider "virtualbox" do |v| v.memory = 1024 endend

provision.sh:

sudo apt-get update sudo apt-get install -q -y git libpq-dev python-dev python-virtualenv cd ~ git clone https://github.com/datamade/dedupe-examples.git virtualenv --no-site-packages venv . venv/bin/activate pip install numpy pip install dedupe pip install psycopg2 dj-database-url unidecode

Shell session:

vagrant@vagrant-ubuntu-trusty-64:~$ source venv/bin/activate (venv)vagrant@vagrant-ubuntu-trusty-64:~$ cd dedupe-examples/csv_example/ (venv)vagrant@vagrant-ubuntu-trusty-64:~/dedupe-examples/csv_example$ python csv_example.py -vv importing data ... /home/vagrant/venv/local/lib/python2.7/site-packages/dedupe/sampling.py:35: UserWarning: 75000 blocked samples were requested, but only able to sample 74750 % (sample_size, len(blocked_sample))) Segmentation fault (venv)vagrant@vagrant-ubuntu-trusty-64:~/dedupe-examples/csv_example$

— Reply to this email directly or view it on GitHub https://github.com/datamade/dedupe-examples/issues/21.

paulshannon commented 9 years ago

I increased the v.memory = 1024 to v.memory = 2048 and it got rid of the SegFault. Thanks!