Closed tcnichol closed 1 year ago
This clowder2 pull request is also reliant on this one:
I tested this successfully with ncsa.wordcount on Clowder 2. Need to make sure doesn't break back-compatibility with Clowder 1.
why is sample-extractors/csv-precipitation/
part of this PR? Is that an error?
One thing we talked about is to have clowder send a message as part of the request to do work, and include clowder version. If no version is provided it will use v1. At that point we can pass the clowder version to most functions. This way pyclowder/extractor does not need to be started with the version of clowder.
Discussion with @max-zilla and @tcnichol :
bigger change for future readiness
clowderclient(string host, string key, string version
and use in v[12] implementations.in future all function will take this instead of host, key and we can remove the clowder_version as a global variable
right now let leave clowder_version as a variable we set in docker env, but this will be removed in future.
Fixed so that the ClowderClient is used in place of host and key.
Right now I am getting a 422 Unprocessable Entity error when the extractor tries to post metadata. I'm thinking this is an error on the clowder2 end, will check and fix on the extractor registration fixes branch.
Ran this with Clowder v1 develop and wordcount worked!
For testing, if you are testing with clowder v2, here is the entry for extractor 'wordcount' you can add to listeners. if you run clowder2 and have wordcount running at the same time, it will submit and post metadata back, which should now be visible on main.
{ "_id": { "$oid": "63b5cd4aeb1180d52266214e" }, "author": "Rob Kooper <kooper@illinois.edu>", "name": "ncsa.wordcount", "version": "2.0", "description": "WordCount extractor. Counts the number of characters, words and lines in the text file that was uploaded.", "creator": null, "created": { "$date": { "$numberLong": "1672858954451" } }, "modified": { "$date": { "$numberLong": "1672858954451" } }, "properties": { "author": "Rob Kooper <kooper@illinois.edu>", "process": { "file": [ "text/*", "application/json" ] }, "maturity": "Development", "name": "ncsa.wordcount", "contributors": [], "contexts": [ { "lines": "http://clowder.ncsa.illinois.edu/metadata/ncsa.wordcount#lines", "words": "http://clowder.ncsa.illinois.edu/metadata/ncsa.wordcount#words", "characters": "http://clowder.ncsa.illinois.edu/metadata/ncsa.wordcount#characters" } ], "repository": [ { "id": { "$oid": "63b5cd4aeb1180d52266214d" }, "repository_type": "git", "repository_url": "" } ], "external_services": [], "libraries": [], "bibtex": [], "default_labels": [], "categories": [], "parameters": { "schema": { "X_MIN_START": { "type": "integer", "title": "X_MIN_START" }, "X_MIN_END": { "type": "integer", "title": "X_MIN_END" }, "Y_MIN_START": { "type": "integer", "title": "Y_MIN_START" }, "Y_MIN_END": { "type": "integer", "title": "Y_MIN_END" }, "ZONE": { "type": "string", "title": "ZONE" } } }, "version": "2.0" } }
These changes will allow a file to be submitted to an extractor and the metadata will post. I have not yet handled cases where new files are uploaded or tags. Easiest way to test is to use the wordcount extractor.
With this branch, add .env file to the pyclowder directory and put in
clowder_version=2.0
Right now I am sending in the Bearer Token from clowder2.0, and then using the Bearer Token in place of the extractor-key or secretKey. I am not sure that this will be a good strategy long term. If an extractor takes a long time to complete, the token may expire, but this seemed like a good enough approach for now.
The branch this works with for clowder2.0 is
https://github.com/clowder-framework/clowder2/tree/register-extractor-submit-file