clowder-framework / pyclowder

Library to assist in the development of extractors for clowder.
University of Illinois/NCSA Open Source License
10 stars 2 forks source link

50 clowder20 submit file to extractor #51

Closed tcnichol closed 1 year ago

tcnichol commented 2 years ago

These changes will allow a file to be submitted to an extractor and the metadata will post. I have not yet handled cases where new files are uploaded or tags. Easiest way to test is to use the wordcount extractor.

With this branch, add .env file to the pyclowder directory and put in

clowder_version=2.0

Right now I am sending in the Bearer Token from clowder2.0, and then using the Bearer Token in place of the extractor-key or secretKey. I am not sure that this will be a good strategy long term. If an extractor takes a long time to complete, the token may expire, but this seemed like a good enough approach for now.

The branch this works with for clowder2.0 is

https://github.com/clowder-framework/clowder2/tree/register-extractor-submit-file

tcnichol commented 2 years ago

This clowder2 pull request is also reliant on this one:

https://github.com/clowder-framework/clowder2/pull/128

max-zilla commented 1 year ago

I tested this successfully with ncsa.wordcount on Clowder 2. Need to make sure doesn't break back-compatibility with Clowder 1.

robkooper commented 1 year ago

why is sample-extractors/csv-precipitation/ part of this PR? Is that an error?

robkooper commented 1 year ago

One thing we talked about is to have clowder send a message as part of the request to do work, and include clowder version. If no version is provided it will use v1. At that point we can pass the clowder version to most functions. This way pyclowder/extractor does not need to be started with the version of clowder.

robkooper commented 1 year ago

Discussion with @max-zilla and @tcnichol :

bigger change for future readiness

in future all function will take this instead of host, key and we can remove the clowder_version as a global variable

robkooper commented 1 year ago

right now let leave clowder_version as a variable we set in docker env, but this will be removed in future.

tcnichol commented 1 year ago

Fixed so that the ClowderClient is used in place of host and key.

tcnichol commented 1 year ago

Right now I am getting a 422 Unprocessable Entity error when the extractor tries to post metadata. I'm thinking this is an error on the clowder2 end, will check and fix on the extractor registration fixes branch.

max-zilla commented 1 year ago

Ran this with Clowder v1 develop and wordcount worked!

tcnichol commented 1 year ago

For testing, if you are testing with clowder v2, here is the entry for extractor 'wordcount' you can add to listeners. if you run clowder2 and have wordcount running at the same time, it will submit and post metadata back, which should now be visible on main.

{ "_id": { "$oid": "63b5cd4aeb1180d52266214e" }, "author": "Rob Kooper <kooper@illinois.edu>", "name": "ncsa.wordcount", "version": "2.0", "description": "WordCount extractor. Counts the number of characters, words and lines in the text file that was uploaded.", "creator": null, "created": { "$date": { "$numberLong": "1672858954451" } }, "modified": { "$date": { "$numberLong": "1672858954451" } }, "properties": { "author": "Rob Kooper <kooper@illinois.edu>", "process": { "file": [ "text/*", "application/json" ] }, "maturity": "Development", "name": "ncsa.wordcount", "contributors": [], "contexts": [ { "lines": "http://clowder.ncsa.illinois.edu/metadata/ncsa.wordcount#lines", "words": "http://clowder.ncsa.illinois.edu/metadata/ncsa.wordcount#words", "characters": "http://clowder.ncsa.illinois.edu/metadata/ncsa.wordcount#characters" } ], "repository": [ { "id": { "$oid": "63b5cd4aeb1180d52266214d" }, "repository_type": "git", "repository_url": "" } ], "external_services": [], "libraries": [], "bibtex": [], "default_labels": [], "categories": [], "parameters": { "schema": { "X_MIN_START": { "type": "integer", "title": "X_MIN_START" }, "X_MIN_END": { "type": "integer", "title": "X_MIN_END" }, "Y_MIN_START": { "type": "integer", "title": "Y_MIN_START" }, "Y_MIN_END": { "type": "integer", "title": "Y_MIN_END" }, "ZONE": { "type": "string", "title": "ZONE" } } }, "version": "2.0" } }