d-chambers / Detex

A Python package for subspace detection and waveform similarity clustering
Other
32 stars 6 forks source link

too many columns, fail to write .index.db #32

Open quapity opened 8 years ago

quapity commented 8 years ago

fails to write indkey table in .index.db. This happens for the latest version of detex I have, also tried with an older version. Deleted and re-indexed though clustering and with getdata- same issue, too many columns. When I do get it to write a .index.db, I get an error because there is no 'indkey' table- so still seems like related error.

Traceback and associated files below.

File "", line 1, in detex.getdata.makeDataDirectories(getContinuous=False)

File "/home/linville/Applications/anaconda/lib/python2.7/site-packages/detex/getdata.py", line 174, in makeDataDirectories

File "/home/linville/Applications/anaconda/lib/python2.7/site-packages/detex/getdata.py", line 202, in _getTemData

File "/home/linville/Applications/anaconda/lib/python2.7/site-packages/detex/getdata.py", line 936, in indexDirectory

File "/home/linville/Applications/anaconda/lib/python2.7/site-packages/detex/util.py", line 880, in saveSQLite DF, Tablename, con=conn, flavor='sqlite', if_exists='append')

File "/home/linville/Applications/anaconda/lib/python2.7/site-packages/detex/pandas_dbms.py", line 83, in write_frame cur.execute(schema)

OperationalError: too many columns on indkey

Archive.zip

d-chambers commented 8 years ago

Ok I am downloading the data now, I will let you know if I can reproduce the error sometime tomorrow

d-chambers commented 8 years ago

I couldn't reproduce the error on my workstation. I will log into the UUSS and see if I can reproduce it there.

kpankow commented 8 years ago

I am also trying to replicate Lisa's error. At the end of getdata I got: detex.getdata.makeDataDirectories(getContinuous=False) Getting template waveforms indexing, or updating index for EventWaveForms /home/pankow/anaconda/lib/python2.7/site-packages/obspy/io/mseed/core.py:733: UserWarning: The encoding specified in trace.stats.mseed.encoding does not match the dtype of the data. A suitable encoding will be chosen. warnings.warn(msg, UserWarning) Traceback (most recent call last):

File "", line 1, in detex.getdata.makeDataDirectories(getContinuous=False)

File "/home/pankow/anaconda/lib/python2.7/site-packages/detex/getdata.py", line 198, in makeDataDirectories fetcher, timeBeforeOrigin, timeAfterOrigin)

File "/home/pankow/anaconda/lib/python2.7/site-packages/detex/getdata.py", line 226, in _getTemData indexDirectory(temDir)

File "/home/pankow/anaconda/lib/python2.7/site-packages/detex/getdata.py", line 961, in indexDirectory detex.util.saveSQLite(dfInd,os.path.join(dirPath, '.index.db'),'indkey')

File "/home/pankow/anaconda/lib/python2.7/site-packages/detex/util.py", line 870, in saveSQLite import ipdb; ipdb.set_trace()

File "/home/pankow/anaconda/lib/python2.7/site-packages/ipdb/init.py", line 7, in from ipdb.main import set_trace, post_mortem, pm, run, runcall, runeval, launch_ipdb_on_exception

File "/home/pankow/anaconda/lib/python2.7/site-packages/ipdb/main.py", line 66, in ipapp = TerminalIPythonApp.instance()

File "/home/pankow/anaconda/lib/python2.7/site-packages/IPython/config/configurable.py", line 365, in instance '%s are being created.' % cls.name

MultipleInstanceError: Multiple incompatible subclass instances of TerminalIPythonApp are being created.

Not sure if this is related to Lisa's error or my personal one. Kris

On 4/27/16 9:07 PM, Derrick wrote:

I couldn't reproduce the error on my workstation. I will log into the UUSS and see if I can reproduce it there.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/d-chambers/Detex/issues/32#issuecomment-215297909

Kristine Pankow, Ph.D. Research Assoc. Professor of Geophysics & Assoc. Director of Seismograph Stations University of Utah 115 South 1460 East Salt Lake City, Utah 84112

pankow seis.utah.edu (email) 801-585-6484 (voice) 801-585-5585 (fax)

d-chambers commented 8 years ago

Looks like it might be. If you run the commands that caused the error in python console (not ipython) when it hits the ipdb.set_trace() in line 870 of utils it will drop you into a debugging session so you can take a look around. I will also try to diagnose the problem remotely.

d-chambers commented 8 years ago

Ok the issue is with writing the "indkey" to the .index.db database. Short story is that I was trying to be too clever in saving some space in the index file (see issue 31) and the method I used doesn't scale past the SQLite database column limit (I didn't realize there was a limit when I wrote the function). So, you cant have more files in any directory level than some number less than 3886.

Once I downloaded the entire data set (I only had a subset previously) I was able to reproduce this error.

I wrote a quick and dirty script (attached) that nests each of the files deeper. After running it and deleting the current index detex works. I will work on a more permanent solution but you use the following as a workaround.

import glob
import os
import obspy
import shutil
import sys

base_directory = 'EventWaveForms'

def get_utc_path(utc):
    year = '%04d' % utc.year
    month = '%02d' % utc.month
    day = '%02d' % utc.day
    path = os.path.join(base_directory, year, month, day)
    return path

for directory, subdirectories, files in os.walk(base_directory, topdown=False):
    for fil in files:
        path = os.path.join(directory, fil)
        try:
            st = obspy.read(path)
        except:
            continue
        utc = st[0].stats.starttime
        new_path = get_utc_path(utc)
        if not os.path.exists(new_path):
            os.makedirs(new_path)
        new_file = os.path.join(new_path, fil)
        shutil.move(path, new_file)
kpankow commented 8 years ago

I ran the script to fix the indexing problem, but now get a different error when I try to run cluster

cl = detex.createCluster(trim=[10,60]) Cannot remove response without a valid inventoryArg, setting removeResponse to False Starting IO operations and data checks EventWaveForms is not currently indexed, indexing now indexing, or updating index for EventWaveForms Traceback (most recent call last):

File "", line 1, in cl = detex.createCluster(trim=[10,60])

File "/home/pankow/anaconda/lib/python2.7/site-packages/detex/construct.py", line 117, in createCluster dtype, enforceOrigin=enforceOrigin, phases=phases)

File "/home/pankow/anaconda/lib/python2.7/site-packages/detex/construct.py", line 621, in _loadEvents phases=phases)

File "/home/pankow/anaconda/lib/python2.7/site-packages/detex/construct.py", line 837, in _loadStream returnName=True, phases=phases):

File "/home/pankow/anaconda/lib/python2.7/site-packages/detex/getdata.py", line 445, in getTemData st = self.getStream(start, end, net, sta, chan, '??')

File "/home/pankow/anaconda/lib/python2.7/site-packages/detex/getdata.py", line 578, in getStream st = self._getStream(self, start, end, net, sta, chan, loc)

File "/home/pankow/anaconda/lib/python2.7/site-packages/detex/getdata.py", line 623, in _loadDirectoryData dfind = _loadIndexDb(fet.directoryName, net+ '.' +sta, t1 - buf, t2 + buf)

File "/home/pankow/anaconda/lib/python2.7/site-packages/detex/getdata.py", line 1001, in _loadIndexDb indexDirectory(dirPath)

File "/home/pankow/anaconda/lib/python2.7/site-packages/detex/getdata.py", line 961, in indexDirectory detex.util.saveSQLite(dfInd,os.path.join(dirPath, '.index.db'),'indkey')

File "/home/pankow/anaconda/lib/python2.7/site-packages/detex/util.py", line 870, in saveSQLite import ipdb; ipdb.set_trace()

File "/home/pankow/anaconda/lib/python2.7/site-packages/ipdb/init.py", line 7, in from ipdb.main import set_trace, post_mortem, pm, run, runcall, runeval, launch_ipdb_on_exception

File "/home/pankow/anaconda/lib/python2.7/site-packages/ipdb/main.py", line 66, in ipapp = TerminalIPythonApp.instance()

File "/home/pankow/anaconda/lib/python2.7/site-packages/IPython/config/configurable.py", line 365, in instance '%s are being created.' % cls.name

MultipleInstanceError: Multiple incompatible subclass instances of TerminalIPythonApp are being created.

d-chambers commented 8 years ago

Did you delete the empty directories in the event waveforms directory?

d-chambers commented 8 years ago

This is an error that happens when you call ipython debugger from within an ipython console. If you run your code in a python terminal (rather than ipython) when the code reaches line 870 in util it will drop you into a debugging session so you can look around and see what is going wrong.

kpankow commented 8 years ago

I had not deleted the empty directories. It is now re-indexing. I will follow-up in the morning. The indexing seems to be taking quite awhile.

I was running the code within anaconda