Closed jvanasco closed 7 years ago
I would suggest subclassing the existing ProviderRegistry
and overriding the register and unregister methods in your subclass. You can parse out the hostname using urlparse(url).netloc
. The bootstrap_*
methods all accept a ProviderRegistry
instance so you can provide an instance of your subclass and get everything registered easily.
ah, subclassing is a good idea. i'll redo the PR I made.
I was suggesting that you subclass it in your own application -- I don't plan to merge the optimization into the codebase in the interests of keeping things simple and extensible.
That's fair.
Would you be interested at all in this reorg of migrating the providers into a dict? You can strip out the 'domain' key if you want.
https://github.com/coleifer/micawber/compare/master...jvanasco:feature-library
fyi, urlparse(url).netloc
is the hostname + port (if present in a url). you can use urlparse(url).hostname
to ensure only a hostname. some of the endpoints serve subdomains though, so a library like tldextract
would be needed to parse out the actual "registered" domain.
I'm considering migrating to micawber from a custom oembed consumer, and wanted to suggest a performance improvement that I am willing to generate a PR for.
I'd like to extend the
ProviderRegistry
with a secondary internal register that nests providers under domain names.this would allow users to optionally avoid a regex match against every provider and only test the domain.
some light tests on a quick mockup showed the lookups to run in 30% the time -- including the overhead of parsing the domain name from a url, but about 5% of the time if you have the domain already.
we would be using this on a high volume indexer, so this performance is a need.