crotwell / seisFile

A library for reading and writing seismic file formats in java.
GNU Lesser General Public License v3.0
28 stars 20 forks source link

Unable to read MSEED file #19

Closed dlnorgaard closed 3 years ago

dlnorgaard commented 3 years ago

Hi. When I try to read a particular MSEED file I get the following stack trace:

edu.iris.dmc.seedcodec.UnsupportedCompressionType: Type 0 is not supported at this time. at edu.iris.dmc.seedcodec.Codec.decompress(Codec.java:141) at edu.sc.seis.seisFile.mseed.DataRecord.decompress(DataRecord.java:138) at gov.usgs.volcanoes.core.data.file.SeedDataFile.extract(SeedDataFile.java:181) at gov.usgs.volcanoes.core.data.file.SeedDataFile.read(SeedDataFile.java:119) at gov.usgs.volcanoes.swarm.data.FileDataSource$1.construct(FileDataSource.java:153) at gov.usgs.volcanoes.swarm.SwingWorker$2.run(SwingWorker.java:113) at java.base/java.lang.Thread.run(Thread.java:832)

I am using version 2.0.0 of seisfile.

It opens in other programs. I would attach the file but it's pretty big....

This is the file info:

BLOCKETTE 1000: (Data Only SEED) next blockette: 56 encoding: STEIM 1 Compression (val:10) byte order: Big endian (val:1) record length: 512 (val:9)

I think full MSEED is not supported so wasn't sure if this was one that is not

crotwell commented 3 years ago

It is always possible that this is a bug in seisFile, but from the looks of it I would say it is the data.

My guess is that the data has some non-timeseries data records, which are legal but are a bit unusual. The SEED manual defines compress 0 as "ASCII text, byte order as specified in field 4" but I did not write extraction code for this type, and my suspicion is that this may not be real data anyway. Often Data Records that have no data and consist only of Blockettes will have the compression set to 0, but they usually also have numSamples=0 in the header. My code handles that case, so this is a little weird.

Is SeedDataFile.java your code?

You might try the mseedlh client that is part of the SeisFile release tarball. It can print all the headers and try to decompress as a debugging tool.

The right thing to do would probably be for SeedDataFile to either catch the exception or to check the compression type before calling decompress(). If the exception is caught, or if the type is not decompressable, then skip that data record.

You can use edu.iris.dmc.seedcodec.Codec.isDecompressable(int type) to check if decompression is supported. Type 0 as well as other more unusual types are not.

If you can give me a way to download the data file, I can take a look, but without being able to reproduce it it is hard to fix.

dlnorgaard commented 3 years ago

Sorry, I forgot to mention yesterday that the file can be viewed by other means and read by tools like mseed2sac. Here is the data file: https://vdap.org:5001/sharing/5PtG3pS9I (link is valid until Monday night).

crotwell commented 3 years ago

The data records that are causing this are SOH channels. For example for the very first data record in the file, mseedlh shows:

    DataRecord      seq=280086 type=D cont=false
      VV.BAY.00.SOH start=2021,025,00:00:00.0000 numPTS=445 sampFac=0 sampMul=0 ac=0 io=0 qual=0 numBlockettes=2 blocketteOffset=48 dataOffset=64 tcor=0
        Blockette1000 encod=0 wOrder=1 recLen=9
        Blockette1001 tQual=0 microsec=0 frameC=7

As I said, seisFile does not support "decompressing" type 0 encoded data records as this is ascii text. The calling function should not try to decompress a data record with this encoding as it is not "compressed". Or alternatively it should catch the UnsupportedCompressionType exception and handle it gracefully.

The fact that the file can be viewed by other means does not indicate that seisFile has a bug, but it may indicate that the calling code in SeedDataFile.java is failing to use the seisFile library correctly. If gov.usgs.volcanoes.core.data.file.SeedDataFile.extract(SeedDataFile.java:181) is not code that you are maintaining, I would suggest you file a bug report with whomever maintains the gov.usgs.volcanoes.core.data.file package.

dlnorgaard commented 3 years ago

Ok, thanks!

crotwell commented 3 years ago

Little more info, I have added better support to the mseedlh tool in seisFIle to at least print out the ascii text in these type of records. Here is output for that first data record in your file:

# data as text
2021  1 24 23:50:00 Mass Positions  -3% -4% -3%
2021  1 24 23:51:00 o/s=    280 drift=     0 pwm= 8901  Auto 3D 
2021  1 24 23:52:00 o/s=    280 drift=     0 pwm= 8900  Auto 3D 
2021  1 24 23:53:00 o/s=    280 drift=     0 pwm= 8899  Auto 3D 
2021  1 24 23:54:00 o/s=    258 drift=   -22 pwm= 8899  Auto 3D 
2021  1 24 23:55:00 o/s=    220 drift=   -38 pwm= 8900  Auto 3D 
2021  1 24 23:55:00 External supply : 12.9V Temperature  38.12'C

I think you can see why this is not decompressable.