djc / rnc2rng

RELAX NG Compact to regular syntax conversion library
MIT License
16 stars 13 forks source link

Support for non-local URLs #17

Closed stefanseefeld closed 4 years ago

stefanseefeld commented 4 years ago

I was trying to use rnc2rng on non-local schemas, such as https://cdn.docbook.org/schema/5.1/rng/docbook.rnc, but get 'FileNotFound' errors. While I can work around such a limitation for a single file using httplib or similar, I would expect rnc2rng to fetch any included schema files from the same source. In fact, the RNC grammar explicitly allows include statements to use URIs.

Is this supported at all ?

djc commented 4 years ago

This is not currently supported, but I think it should be easy to add. Would you be interested in contributing support for it? I think what you'd have to do is to change parser.component_include() such that it will use httplib if the URL starts with http or https, and then pass the resulting response (which acts as a file-like, I think?) to parse(). Should probably also add a test case for it.

Failing that, it would be useful if you can contribute a small test case (ideally as a PR).

stefanseefeld commented 4 years ago

Thanks, I'll look into it.

stefanseefeld commented 4 years ago

I'm actively working on a fix that involves using urllib.urlopen() rather than open() (and urllib.parse.urljoin() rather than os.path.join() whenever URLs are used. However, I'm encountering a strange error: I find that

urllib.parse.urljoin('file:./tests/foo.rnc', 'datatypes.rnc')

(i.e. where the base url is a relative local filename) yields

'file:///tests/datatypes.rnc'

(i.e. an absolute path that doesn't exist) which seems wrong. Am I abusing urljoin here or is this a bug ?

stefanseefeld commented 4 years ago

Please see https://github.com/djc/rnc2rng/pull/18 for a fix.