cloudera / seismichadoop

System for performing seismic data processing on a Hadoop cluster.
30 stars 24 forks source link

Processed file not getting re-loaded #2

Open kaliyugantagonist opened 11 years ago

kaliyugantagonist commented 11 years ago

Hi,

I'm using Apache Hadoop cluster + seismic-0.1.0-job.jar.

A SegY file gets loaded properly and also I'm able to perform seismic operations on it e.g Hilbert Transform + Whitening.

When I unloaded this processed file on the local file system by name AGC_Hilbert.segy(signifying that Hilbert Transform has been performed on this file) and tried to reload it, I got an error:

/Loading processed file/ ./suhdp load -input /home/hd/omkar/AGC_Hilbert.segy -output /sufiles/AGC_Hilbert.su /home/hd/seismicunix

Reading input file: /home/hd/omkar/AGC_Hilbert.segy BIG_ENDIAN :BIG_ENDIAN

/home/hd/seismicunix/bin/segyread: format not SEGY standard (1, 2, 3, 5, or 8) 1+0 records in 6+1 records out 3200 bytes (3.2 kB) copied, 0.000103235 s, 31.0 MB/s Bytes read: 0 Callback list size 1 Bytes written: 0 path: /sufiles/AGC_Hilbert.su path: hdfs://172.25.38.87:9000/user/hd path: hdfs://172.25.38.87:9000/user/hd parent: /sufiles

I flipped through the code and I think that the issue exists because(not sure!) in SegyUnloader.java, the part files formed post-processing are directly written to a DataOutputStream to a local file WITHOUT enforcing the BIG ENDIAN order unlike SUReader.java.

Thanks and regards !!!

jwills commented 11 years ago

That sounds plausible-- could you send me a pull request w/the fix?

kaliyugantagonist commented 11 years ago

I made some small changes to the SegyUnloader.write(...) method to enforce the BIG ENDIAN format but the issue persists(the file opens with SeiSee but doesn't get reloaded). On the contrary, after that change, post-Hilbert/Whitening operations, the size of the segy file after unloading was increased !

private void write(Path path, FileChannel out, Configuration conf) throws Exception { System.out.println("Reading: " + path); SequenceFile.Reader reader = new SequenceFile.Reader( FileSystem.get(conf), path, conf); BytesWritable value = new BytesWritable(); while (reader.next(NullWritable.get(), value)) { out.write(ByteBuffer.wrap(value.getBytes()).order( ByteOrder.BIG_ENDIAN)); } reader.close(); }

I'm afraid that my first suspicion is true - while loading the file we are using the segyread command but during unload, the part files(SequenceFile) are simply written to the local file system WITHOUT USING segywrite command.

jwills commented 11 years ago

Hrm-- that might be b/c the output files are SU files, not SEGY files.