Gagravarr / VorbisJava

A library for working with Ogg Vorbis files
Apache License 2.0
126 stars 26 forks source link

ClassCastError on opening OGG video #3

Closed CodingFabian closed 10 years ago

CodingFabian commented 10 years ago
Exception in thread "main" java.lang.ClassCastException: org.gagravarr.vorbis.VorbisAudioData cannot be cast to org.gagravarr.vorbis.VorbisInfo
    at org.gagravarr.vorbis.VorbisFile.<init>(VorbisFile.java:78)
    at org.gagravarr.vorbis.VorbisFile.<init>(VorbisFile.java:55)
    at OggBug.main(OggBug.java:10)

can be reproduced by downloading http://mirror.bigbuckbunny.de/peach/bigbuckbunny_movies/big_buck_bunny_720p_stereo.ogg and using this code to load it:

public class OggBug {
  public static void main(String[] args) throws Exception {
    FileInputStream fin = new FileInputStream("/Users/fabian/Downloads/big_buck_bunny_720p_stereo.ogg");
    OggFile ogg = new OggFile(fin);
    VorbisFile vorbis = new VorbisFile(ogg);
    System.out.println(vorbis);
  }
}
Gagravarr commented 10 years ago

Are you able to find / produce a much smaller (sub 1mb) video file that reproduces the problem? We'll really need a test file to go with any fix + unit test, but I don't really fancy committing a ~200mb file to the repo to try to test against...

CodingFabian commented 10 years ago

I understand. But you are not saying that you ignore the problem until you have a smaller file? It should be possible to find and fix the issue even when we cannot find a smaller video which also fails? Fabian

CodingFabian commented 10 years ago

here a 400k file, which also has the problem http://techslides.com/demos/sample-videos/small.ogv

CodingFabian commented 10 years ago

this also doesnt work http://commons.wikimedia.org/wiki/File:Xacti-AC8EX-Sample_video-001.ogg

CodingFabian commented 10 years ago

http://playground.html5rocks.com/samples/html5_misc/chrome_japan.ogv doesnt work as well. Are you actually having any ogg video which works that way?

CodingFabian commented 10 years ago

Maybe it is not supposed to be created a VorbisFile out of it? We got there from the tika parser:

Caused by: java.lang.ClassCastException: org.gagravarr.vorbis.VorbisAudioData cannot be cast to org.gagravarr.vorbis.VorbisInfo at org.gagravarr.vorbis.VorbisFile.(VorbisFile.java:78) ~[vorbis-java-core-0.1.jar:na] at org.gagravarr.vorbis.VorbisFile.(VorbisFile.java:55) ~[vorbis-java-core-0.1.jar:na] at org.gagravarr.tika.VorbisParser.parse(VorbisParser.java:58) ~[vorbis-java-tika-0.1.jar:na] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) ~[tika-core-1.5.jar:na]

maybe the vorbis parser should not be used?

CodingFabian commented 10 years ago

Looks like it originates from here:

else if(streams > 0) { // Something else... // TODO Detect video }

the Detector claims it to be general application/ogg. Then Tika MimeTypes detector comes along and says "oh i know its audio/ogg, and thats better than application/ogg" so i go with it. Would it be ok to return OGG_VIDEO in the todo part? that would prevent tika from overruling

Gagravarr commented 10 years ago

Rather than blindly returning OGG_VIDEO, it should probably be updated to detect the various kinds of video streams above, so that it could then (say) have a check like if (theora_streams > 0 || dirac_streams > 0) { return OGG_VIDEO; }. That'd also want a unit test or two, hence the need for some very small test files!

CodingFabian commented 10 years ago

is the 400k file good enough? I also proposed to tika to fix the mime magic detection so that application/ogg does not incorrectly get overwritten by audio/ogg

Gagravarr commented 10 years ago

Do you know what license the small.ogv or chrome_japan.ogv files are under? A 400kb file is probably alright, as long as it's under a license where we can distribute it!

CodingFabian commented 10 years ago

maybe take them from here: https://wiki.xiph.org/TheoraTestsuite they are explicitly intended for testing.

CodingFabian commented 10 years ago

if in doubt, why not include an automatic download in the pom? that way you are not distributing the file with your source.

Gagravarr commented 10 years ago

OK, I'll have a play over the weekend, and see what I can manage. (I've got a plan now, just need the time to implement it!)

If you have a spare little bit of time, any chance you could review the Ogg Checksum code, and see if you can work out why the code is generating warnings? The spec says you should ignore packets which don't have a valid checksum, but I'm reluctant to do that until I'm sure the code calculates them correctly! (Your TIKA-1112 will need this fix)

CodingFabian commented 10 years ago

I dont care about TIKA-1112, but I have a bit of time today, so I will look into the checksumming.

CodingFabian commented 10 years ago

I had limited time, what i noticed is that the checksum is "long" while the crc value is "int". Also i am puzzled by the ogg documentation which says "LSb of LSB first.", but the value for sequence number seems to be ok.

Gagravarr commented 10 years ago

Any chance you could grab the latest code from git, build, bump the dependency in tika parsers to 0.4-snapshot, and test?

I believe it's now fixed, and with a sample theora file I'm seeing:

$ java -jar tika-app-1.6-SNAPSHOT.jar --metadata chrome_japan.ogv Content-Length: 7868057 Content-Type: video/theora resourceName: chrome_japan.ogv streams-annodex: 1 streams-audio: 1 streams-metadata: 1 streams-theora: 1 streams-total: 3 streams-video: 1 streams-vorbis: 1

CodingFabian commented 10 years ago

All my ogg video files now parse correctly. The checksumming still is somehow broken, but as it only produces parse warnings, i am happy with it so far. I tried to look at the checksumming but besides possible type conversion problems I could not find any problems according to spec.

Gagravarr commented 10 years ago

OK, I've released v0.4, and upgraded Tika to use it, so I believe we're now all sorted for this

(Issue 5 has been opened to track the checksum problem)