plugins accept URLs - Githubissues

jasonpriem commented 13 years ago

Currently, plugins just accept an array of ID strings, which could be URLs, DOIs, or whatever. However, a number of plugins are going to need URLs to work, regardless of whether the user input a URL or DOI. That means these plugins rely on the CrossRef plugin having done its resolving task, so it needs to be run first. This is unpleasant as it introduces cross-plugin dependencies, but I see no way around it other than making Delicious, Facebook, Twitter, and other plugins that need to make their own calls to CrossRef. That seems like being a bad neighbor.

If we do decide to the the CrossRef call first, we'll still need to get that info to the plugins. so, we'll need to feed the plugins an array of objects that looks like {<id>: <url>, <id>:<urll>, ... }

Any thoughts?

hpiwowar commented 13 years ago

Some of the other artifacts also have DOIs or other IDs that could also be turned into URLs. We'd ideally like those URLS to also be included in searches across Delicious, Facebook, Twitter, etc.

It makes me think we actually have two types of plugins: one type that gets ID synonyms (and metadata, if we go that route?) and is run first, and another type that takes all these synonyms and gets the usage metrics (these would be our current plugins, modified to take an array of synonym objects, as you suggest).

It also means that the Twitter plugin, which was maybe registered as an Article plugin in our current config, also needs to be registered as a Dataset plugin or something like that. Can we allow setting plugins to multiple artifact types in the config file?

Hmmm.

jasonpriem commented 13 years ago

I've modified the plugin return requirements to take this idea into account. It also now lets the plugin send a description of itself, as you suggested earlier. It's a bit more complex, but it fits the actual data model better now.

One note is that the only thing the controller knows about the plugins is their name and url (and to run CrossRef first). There's no need to register as a certain kind of plugin; plugins get sent all the IDs we have for a given collection. They can then do stuff with any artifact genre (article, slideshow, whatever) they want to, including multiple genres.

On 05/28/2011 09:10 AM, hpiwowar wrote:

Some of the other artifacts also have DOIs or other IDs that could also be turned into URLs. We'd ideally like those URLS to also be included in searches across Delicious, Facebook, Twitter, etc.

It makes me think we actually have two types of plugins: one type that gets ID synonyms (and metadata, if we go that route?) and is run first, and another type that takes all these synonyms and gets the usage metrics (these would be our current plugins, modified to take an array of synonym objects, as you suggest).

It also means that the Twitter plugin, which was maybe registered as an Article plugin in our current config, also needs to be registered as a Dataset plugin or something like that. Can we allow setting plugins to multiple artifact types in the config file?

Hmmm.

hpiwowar commented 13 years ago

I like your changes.

I'm not sure if I communicated this one part well. In addition to what you've suggested, I think the controller needs to run more things "first", in addition to and in parallel with Crossref and for the same reasons that we want to run Crossref first. For example a Dryad-lookup plugin that takes DOIs and returns urls so these can then be fed to twitter etc. A GEO data plugin that takes Accession numbers and returns urls. Perhaps even a doi->PMID lookup would be useful? (or vice vera)

So I'm suggesting we have a way to tell the controller what other plugins are like crossref, that they should be run first, to collect synonyms to be used by other plugins. Basically a config list for these. Incidentally, these same plugins may also be registered (defined in the config) as returning metric information (and thus should be run again in the collecting-metrics-phase) like Dryad, or they may not.

(fwiw in the Dryad case, I'm imagining for the Dryad plugin is actually run twice I guess, once in parallel with Crossref to generate urls, and once in parallel with Twitter (wherein twitter uses the Dryad urls, and Dryad also re-returns its own metrics information)

jasonpriem commented 13 years ago

Hm, I see what you mean now. I agree, there are multiple plugin dependencies. I guess that the right way to handle this would be to store the plugin registry as a dependency graph.

The easy way would be to just list plugins in the order that we want them run.

I'm leaning toward just doing the latter for the alpha, but I could be convinced otherwise.

On 05/28/2011 12:19 PM, hpiwowar wrote:

I like your changes.

I'm not sure if I communicated this one part well. In addition to what you've suggested, I think the controller needs to run more things "first", in addition to and in parallel with Crossref and for the same reasons that we want to run Crossref first. For example a Dryad-lookup plugin that takes DOIs and returns urls so these can then be fed to twitter etc. A GEO data plugin that takes Accession numbers and returns urls. Perhaps even a doi->PMID lookup would be useful? (or vice vera)

So I'm suggesting we have a way to tell the controller what other plugins are like crossref, that they should be run first, to collect synonyms to be used by other plugins.

hpiwowar commented 13 years ago

Yup, the latter works for me for alpha. With server code that takes synonyms (even just restricted to urls for now, if we want to keep it straightforward) returned as metric_values from previous plugins and passes them into subsequent plugins?

jasonpriem commented 13 years ago

'Zackly. The update controller is responsible for assembling the input object described on the Plugin Requirements page, using the output of previous plugins (if available).

On 05/28/2011 01:02 PM, hpiwowar wrote:

Yup, the latter works for me for alpha. With server code that takes synonyms (even just restricted to urls for now, if we want to keep it straightforward) returned as metric_values from previous plugins and passes them into subsequent plugins?

hpiwowar commented 13 years ago

super.

hpiwowar commented 13 years ago

I'm a little confused on details. Is this what we mean? assuming

"artifact_name": {
        "doi":false, // boolean false, not string "false"
        "url": false,
        "pmid": false
    },

The crossref plugin will just look at the artifact_name (and populate the doi, url, pmid values) The dryad plugin will just look at the artifact_name (and populate the doi, url values) The slideshare plugin will just look at the artifact_name (and populate the url values) The facebook plugin will just look at the url values. (or also the artifact_names, in case some not claimed by other plugins but still valid facebook lookups, maybe as "other" artifact type???) ...

jasonpriem commented 13 years ago

Well, the plugins aren't technically "populating" any of these, since this is input, not output. The plugins send back whatever metrics they send back, which can include URL, DOI, and PMID among many others.

Right now, the database makes this plugin-input object by getting the artifact name from the list of artifacts associated with the collection, then looking at the CrossRef data to populate the doi, url, and pmid fields. So Dryad data about doi, url, and pmid for a given artifact_name is ignored. That can change, of course, but I feel like CrossRef is the most important one for now.

jasonpriem commented 13 years ago

Plugins now must accept these an object with these three identifier types.

figshare / Total-Impact

plugins accept URLs #16