cygri / prefix.cc

Source code to the prefix.cc website
http://prefix.cc/
The Unlicense
38 stars 10 forks source link

Support underscores in the prefixes #24

Open berezovskyi opened 6 years ago

berezovskyi commented 6 years ago

Underscore is a valid prefix char.

I traced the necessary changes till:

https://github.com/cygri/prefix.cc/blob/abe782e8223dc9959a5c089dfb135656dcc9cb3b/lib/namespaces.class.php#L157 and https://github.com/cygri/prefix.cc/blob/abe782e8223dc9959a5c089dfb135656dcc9cb3b/lib/site.class.php#L285

Would you be ready to merge a PR for that?

cygri commented 6 years ago

What is the use case for underscores?

The reason it's not allowed is that I don't want separate or different mappings for dcterms, DCterms, dcTerms, dc_terms, DC-terms, dc.terms, or whatever other variations you could think of. I'd rather allow only one variation and have people “fight” over it, than allowing all of them and people accidentally ending up with the wrong URI because they didn't realise that dcterms and dc_terms were mapped to different URIs.

That's why prefix.cc doesn't allow uppercase characters and punctuation.

The site is about the popular/canonical mappings, and not so much about the “long tail” of prefix mappings that are used only by a small group of people.

Better supporting the “long tail” would be a completely reasonable goal, and some kind of punctuation to allow grouping of prefixes would probably be part of that. But the site lacks various other features that would be required to do a decent job on that goal.

berezovskyi commented 6 years ago

Thanks @cygri for getting back to me.

The use case is that the OSLC standard (you can think of it as an LDP for the enterprise) developed under OASIS is using the following prefixes in the spec (as RFC SHOULDs plus most of them are in use in the apps since 2009):

I totally agree with you that the proliferation of dc.terms and the like would be unacceptable. But in our case, the prefix with an underscore is the "canonical" one (as much as a prefix can be).

cygri commented 6 years ago

Good point.

The argument I made above for the limited [a-z0-9] range is pretty strong, in my opinion.

But it's good that vocabulary authors propose canonical prefixes for their vocabularies. And it is desirable to have those proposed canonical prefixes in prefix.cc. And it's inevitable that some authors will propose prefixes outside of the [a-z0-9] range currently allowed by prefix.cc.

I don't currently have a good idea on how to resolve this contradiction.

berezovskyi commented 6 years ago

I think the best way to resolve it would be to use https://github.com/perma-id/w3id.org approach with the pull-request model to add the prefixes. That would involve a lot of rework of the prefix.cc codebase; not sure I would have the time to do it if you give a green light.

Practical way to resolve this may be to remove the restrictions but have a sort of premoderation. I don't think it will be much of moderation work, but then again, involves significant code changes to add an admin panel.

The most practical way would be for me to ask you to add the prefixes via phpMyAdmin and forget about this issue until more people complain :) (Though would still require minimal code changes to resolve http://prefix.cc/oslc_rm for example)

cygri commented 6 years ago

Good analysis. Can you make a separate PR just to make underscores resolve? I'll dig out the phpMyAdmin password…

hsolbrig commented 3 years ago

What is the status of this PR? We'd love to use and leverage prefix.cc, but many of the namespaces we're working with end with underscores (example: The Human Phenotype Ontology, "HP", uses "http://purl.obolibrary.org/obo/HP_" as a prefix).

cygri commented 3 years ago

@hsolbrig This PR is about underscores in the prefix. It looks like you want underscores as the last character of the namespace URI.

hsolbrig commented 3 years ago

Ah - missed that. Has there been discussion on that topic?

cygri commented 3 years ago

@hsolbrig Yes - https://twitter.com/cygri/status/1344250746171252736