50 clowder20 submit file to extractor

tcnichol commented 2 years ago

These changes will allow a file to be submitted to an extractor and the metadata will post. I have not yet handled cases where new files are uploaded or tags. Easiest way to test is to use the wordcount extractor.

With this branch, add .env file to the pyclowder directory and put in

clowder_version=2.0

Right now I am sending in the Bearer Token from clowder2.0, and then using the Bearer Token in place of the extractor-key or secretKey. I am not sure that this will be a good strategy long term. If an extractor takes a long time to complete, the token may expire, but this seemed like a good enough approach for now.

The branch this works with for clowder2.0 is

https://github.com/clowder-framework/clowder2/tree/register-extractor-submit-file

tcnichol commented 2 years ago

This clowder2 pull request is also reliant on this one:

https://github.com/clowder-framework/clowder2/pull/128

max-zilla commented 1 year ago

I tested this successfully with ncsa.wordcount on Clowder 2. Need to make sure doesn't break back-compatibility with Clowder 1.

robkooper commented 1 year ago

why is sample-extractors/csv-precipitation/ part of this PR? Is that an error?

robkooper commented 1 year ago

One thing we talked about is to have clowder send a message as part of the request to do work, and include clowder version. If no version is provided it will use v1. At that point we can pass the clowder version to most functions. This way pyclowder/extractor does not need to be started with the version of clowder.

robkooper commented 1 year ago

Discussion with @max-zilla and @tcnichol :

remove exchange
- remove variable from dockerfile
remove registration, only leave hearbeat
remove extra simple extractor
Need to update changelog

bigger change for future readiness

introduce clowderclient(string host, string key, string version and use in v[12] implementations.

in future all function will take this instead of host, key and we can remove the clowder_version as a global variable

robkooper commented 1 year ago

right now let leave clowder_version as a variable we set in docker env, but this will be removed in future.

tcnichol commented 1 year ago

Fixed so that the ClowderClient is used in place of host and key.

tcnichol commented 1 year ago

Right now I am getting a 422 Unprocessable Entity error when the extractor tries to post metadata. I'm thinking this is an error on the clowder2 end, will check and fix on the extractor registration fixes branch.

max-zilla commented 1 year ago

Ran this with Clowder v1 develop and wordcount worked!

tcnichol commented 1 year ago

For testing, if you are testing with clowder v2, here is the entry for extractor 'wordcount' you can add to listeners. if you run clowder2 and have wordcount running at the same time, it will submit and post metadata back, which should now be visible on main.

{ "_id": { "$oid": "63b5cd4aeb1180d52266214e" }, "author": "Rob Kooper <kooper@illinois.edu>", "name": "ncsa.wordcount", "version": "2.0", "description": "WordCount extractor. Counts the number of characters, words and lines in the text file that was uploaded.", "creator": null, "created": { "$date": { "$numberLong": "1672858954451" } }, "modified": { "$date": { "$numberLong": "1672858954451" } }, "properties": { "author": "Rob Kooper <kooper@illinois.edu>", "process": { "file": [ "text/*", "application/json" ] }, "maturity": "Development", "name": "ncsa.wordcount", "contributors": [], "contexts": [ { "lines": "http://clowder.ncsa.illinois.edu/metadata/ncsa.wordcount#lines", "words": "http://clowder.ncsa.illinois.edu/metadata/ncsa.wordcount#words", "characters": "http://clowder.ncsa.illinois.edu/metadata/ncsa.wordcount#characters" } ], "repository": [ { "id": { "$oid": "63b5cd4aeb1180d52266214d" }, "repository_type": "git", "repository_url": "" } ], "external_services": [], "libraries": [], "bibtex": [], "default_labels": [], "categories": [], "parameters": { "schema": { "X_MIN_START": { "type": "integer", "title": "X_MIN_START" }, "X_MIN_END": { "type": "integer", "title": "X_MIN_END" }, "Y_MIN_START": { "type": "integer", "title": "Y_MIN_START" }, "Y_MIN_END": { "type": "integer", "title": "Y_MIN_END" }, "ZONE": { "type": "string", "title": "ZONE" } } }, "version": "2.0" } }

clowder-framework / pyclowder

50 clowder20 submit file to extractor #51