eclipse-rdf4j / rdf4j

Eclipse RDF4J: scalable RDF for Java
https://rdf4j.org/
BSD 3-Clause "New" or "Revised" License
362 stars 165 forks source link

StringIndexOutOfBoundException in ParsedIRI #1066

Closed VladimirAlexiev closed 6 years ago

VladimirAlexiev commented 6 years ago

Trying to load https://permid.org/sfiles/bulkDownload/OpenPermID-bulk-organization-20180805_070415.ttl.gz to GraphDB (need free registration at permid.org to obtain this file). I get this exception:

[ERROR] 2018-08-08 12:02:17,230 [import-task-permid-1 | c.o.f.i.FileImportRunnableTask] Could not import file
java.lang.StringIndexOutOfBoundsException: String index out of range: 21
at java.lang.String.codePointAt(String.java:687)
at org.eclipse.rdf4j.common.net.ParsedIRI.error(ParsedIRI.java:1139)
at org.eclipse.rdf4j.common.net.ParsedIRI.parseHost(ParsedIRI.java:997)
at org.eclipse.rdf4j.common.net.ParsedIRI.parse(ParsedIRI.java:858)

Ok, so I'll use jena riot to validate it. Please create an rdf4j rio command-line utility like RIOT, so I can validate with RIO.

barthanssens commented 6 years ago

Sounds like an issue for https://github.com/eclipse/rdf4j-tools subproject.

There is a verify command in the console tool, but what is really missing (IMHO) is a way to run a command/script from the command line (e.g. Console -x verify ) and exit with an exit code (so it could be used in a batch script, instead of interactively)

abrokenjester commented 6 years ago

@barthanssens you can kinda do that though:

$ echo verify test.ttl | ./console.sh 

but I agree command line switches would be useful.

abrokenjester commented 6 years ago

We're kinda conflating issues here though. I'll keep this issue to focus on the parser bug reported, I'll add a separate feature ticket to the tools project for command line switches to do actions like verification.

abrokenjester commented 6 years ago

I can reproduce this issue in the latest milestone.

VladimirAlexiev commented 6 years ago

I didn't know about console verify, thanks! Does it say the line number?

Riot reports about 50 malformed web URLs (see graphdb jira), I fixed them with a perl script but no cigar

abrokenjester commented 6 years ago

The problem is specific to an IRI with in IP address, but no path: http://178.62.246.130 . There is a bug in ParsedIRI that expects either a port number or a path behind this. I'm working on a fix.

abrokenjester commented 6 years ago

Btw the line on which this first occurs in this file is 6360967. Unfortunately the console also doesn't report this, due to a related error. To be clear, normally the console would report line numbers on errors.

abrokenjester commented 6 years ago

PR up for a fix. I have verified that with this fix in place, the file parses without issues (it doesn't even require disable URI checking).

abrokenjester commented 6 years ago

I forgot to mention that if you want to fix that Uri to parse, all you need to do is add a trailing slash: http://178.62.246.130/