Wcoyo / google-video-subtitles-parser

Automatically exported from code.google.com/p/google-video-subtitles-parser
GNU Lesser General Public License v3.0
0 stars 0 forks source link

error de lectura de la url #10

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
$java -jar ./google-video-subtitles-parser.jar
http://video.google.com/videoplay?docid=3D-2007648776956021617&q=3Dsource%3=A017
692672747524784062&hl=3Des

What is the expected output? What do you see instead?

 Exception in thread "main" java.lang.RuntimeException: Due to an
 IOException, the parser could not check
 http://video.google.com/videotranscript?frame=3Dc&type=3Dlist&docid=3D-2007=
 648776956021617
     at net.jmt4b04d4v.gvideo.sax.GoogleVideoSAXParser.main
 (GoogleVideoSAXParser.java:138)
 Caused by:
 com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException:
 Invalid byte 2 of 4-byte UTF-8 sequence.
     at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte
 (UTF8Reader.java:674)
     at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read
 (UTF8Reader.java:463)
     at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load
 (XMLEntityScanner.java:1742)
     at
 com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanLiteral
 (XMLEntityScanner.java:1064)
     at
 com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue
 (XMLScanner.java:974)
     at
 com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanAttrib=
 ute
 (XMLNSDocumentScannerImpl.java:460)
     at
 com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartE=
 lement
 (XMLNSDocumentScannerImpl.java:277)
     at
 com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl
 $FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2747)
     at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next
 (XMLDocumentScannerImpl.java:648)
     at
 com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next
 (XMLNSDocumentScannerImpl.java:140)
     at
 com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scan=
 Document
 (XMLDocumentFragmentScannerImpl.java:510)
     at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse
 (XML11Configuration.java:807)
     at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse
 (XML11Configuration.java:737)
     at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse
 (XMLParser.java:107)
     at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse
 (AbstractSAXParser.java:1132)
     at net.jmt4b04d4v.gvideo.sax.GoogleVideoSAXParser.main
 (GoogleVideoSAXParser.java:128)

What version of the product are you using? On what operating system?

Versión: M2

S.O: Ubuntu 8.10, JRE: Java 1.5.0-16-3, Kernel: 2.6.24-19-generic

Please provide any additional information below.

Original issue reported on code.google.com by rauldi...@gmail.com on 21 Apr 2009 at 1:51

GoogleCodeExporter commented 8 years ago
[en]
I have detected some issues with UTF-8 characters in some other 
captioned-videos 
like the following ones: 

 * http://video.google.com/videoplay?docid=-6733596013688235740
 * http://video.google.com/videoplay?docid=-7619379823675726232

Seems to be the same case. Could you try the following URL:

>java -jar google-video-subtitles-parser.jar http://video.google.com/videoplay?
docid=6673734199138235720&ei=InbuSb-dCoP0rgK45JHlBg&q=subtitle

And choose track id '0' (which has no information attached).

I've tried this URL today without problems, M2 on Windows XP and JRE-1.6u13 
(not at 
home).

If that works, we could discard other posibilities and confirm the issue.

Problem seems to be in the transcription list document.

[es]
He detectado problemas con el manejo de algunos caracteres UTF-8 en algunos 
otros 
videos con subtítulos como los siguientes: 

 * http://video.google.com/videoplay?docid=-
6733596013688235740&ei=K3PuSaiuHqegrQKtkdDVBg&q=subtitle
 * http://video.google.com/videoplay?docid=-7619379823675726232

Parece ser el mismo caso. Podrías probar el siguiente URL:

>java -jar google-video-subtitles-parser.jar http://video.google.com/videoplay?
docid=6673734199138235720&ei=InbuSb-dCoP0rgK45JHlBg&q=subtitle

Y escoger la pista '0' (la cuál no tiene información adjunta).

He probado este URL hoy sin problemas, M2 en Windows XP y JRE-1.6u13 (no en 
casa).

Si te funciona, descartaríamos otras posibilidades y se confirmaría el 
problema.

El problema parece ser el documento de listas de transcripciones.

Original comment by jmt4b04...@gmail.com on 22 Apr 2009 at 2:07