d-chambers / Detex

A Python package for subspace detection and waveform similarity clustering
Other
32 stars 6 forks source link

writing new detections to template key #34

Closed kpankow closed 8 years ago

kpankow commented 8 years ago

When using res.writeDetections(eventDir='DetectedEvents',updateTemKey=True) to update the TemplateKey the 'TIME' format is wrong. Looking at the bottom of the TemplateKey (head and tail shown below), the TIME seems to be in seconds instead of the date time stamp. When I later tried to use this template key to get lag times with cl = detex.createCluster(CCreq=0.68,trim=[5,30],fetch_arg='../EventWaveForms',fileName='clustDD.pkl',enforceOrigin=True) The memory fills and python crashes. Although I can run cluster cl = detex.createCluster(CCreq=0.68,trim=[5,30],fetch_arg='../EventWaveForms') and get cluster results.

[brewster:DATA/CIRCLEVILLE/Detections_1wk] pankow% head TemplateKey.csv_det ,Unnamed: 0,TIME,NAME,LAT,LON,MAG,DEPTH,STMP 0,0.0,2010-09-29T15-48-59.63,2010-09-29T15-48-59.63,38.202,-112.251833333,1.29,4.32,1285775339.63 1,1.0,2011-01-03T12-06-36.88,2011-01-03T12-06-36.88,38.2473333333,-112.33983333299999,4.56,5.4,1294056396.88 2,2.0,2011-01-03T12-10-08.66,2011-01-03T12-10-08.66,38.2491666667,-112.30616666700001,2.92,2.03,1294056608.66 3,3.0,2011-01-03T12-23-19.05,2011-01-03T12-23-19.05,38.248666666700004,-112.320333333,0.96,1.68,1294057399.05 [brewster:DATA/CIRCLEVILLE/Detections_1wk] pankow% tail TemplateKey.csv_det 397,,1294325117.3600001,d2011-01-06T14-45-17,,,0.12130792350080775,, 398,,1294328589.025,d2011-01-06T15-43-09,,,0.42995299945604737,, 399,,1294330616.4850001,d2011-01-06T16-16-56,,,0.05488828309071697,, 400,,1294331688.385,d2011-01-06T16-34-48,,,-0.1258472051222964,, 401,,1294332071.1999998,d2011-01-06T16-41-11,,,-0.06266347145329888,,

d-chambers commented 8 years ago

When reading the template/station keys detex passes everything in a time column to the obspy.UTCDateTime class constructor so using a time stamp is probably not the issue that is causing the crash, especially if you can create a cluster from it. Can you give me a sense of size of the problem space? How many stations and templates are you using?

kpankow commented 8 years ago

There are 408 total events in the TemplateKey and three stations. I had assumed it was related to the time stamp, because the only difference between the two runs is if enforceOrigin=True or False. Set to true, it will not work with this TemplateKey, but if I delete the detections from the TemplateKey and have a more standard format there is not problem running enforceOrigins=True

d-chambers commented 8 years ago

One of the predicted origin times is listed as May 7th, 1294. When enforceOrigin tries to fill zeroes it creates an array too large to fit into memory. I will look into how the predicted origin got so far off. Also, I will add a check to the _loadStream function so detex will raise a value error rather than try and create such a large array.

d-chambers commented 8 years ago

You were right.

Obspy.UTCDateTime('1294056736.705') returns the year 1294 rather than raising an error so the appropriate logic that interprets it as a timestamp doesn't execute. This could happen when date strings and floats are in the same column (which is interpreted as a string column). I will change detex to only output date strings rather than timestamps. I added a template key with the string "_fixed" that works fine. You can look at the script debug.py to see what I did.