Open andreleblanc11 opened 5 months ago
thinking:
until today, filtering is on the upstream url alone. It doesn't look at any downstream values or results. so when formulating accept/reject you can just look at the where the files are on the upstream server to know what to filter on. The "rename" headers are not visible in the upstream... the user has to download a few...look at the message, notice the rename, and then start filtering? Is that the expectation?
if you have plugins (which have the name... you may have noticed "after_accept") all the renamers take effect after the accept/reject filtering has happenned... so it's confusing that if using the rename header in the message, the final name is used, but the after_accept stuff is totally ignored... should we then have "before_accept" make a class of renamers that work for that case?
the rename header is kind of a left-over of early development... not really meant to be used going forward... I know DMS uses it. the more future oriented way of accomplishing that is to use relpath and RetrievePath, and then the normal filtering will work:
Maybe we can implement a filter or internal logic to do that all the time....have not thought about it enough to say for sure. This could certainly be implemented using a before_accept() plugin, if we made those...
Currently filtering happens on one path, if we filter on two paths, it brings up the possibility of conflict:
Given that you are getting the messages from the DMS winnows., you could implement an after_accept on the winnows themselves that swap the rename header for relPath and retrievePath instead... they might have to publish as v3 though... not sure if retrievePath is a thing in v2... would have to try out. anyways... if that were done, the current default processing would work on the downstream consumers with no issue. so there would be no core code change then, just a new plugin. not saying we should do that... just trying to tease out what options there are.
another interpretation... (maybe this is what you meant? )
I think it would be good to try adding an after_accept plugin to the winnows to map the relPath to retrievePath, and have the relPath refer to renamed version (as described in previous post).... and see how it goes... It's a good approach because it will only affect the use case being raised, does not change things for anyone who depends on the current behaviour.
As explained above, I don't think making rename filtering is a good approach. I added a wontfix tag express that. still there is and approach outlined above that will solve the problem without doing what the issue title asks.
Use case
rename
field in the message. The logs from the winnow show that therename
field is set when the winnow posts.elif
after the sundew extension (see snippet below), where we could change theurlToMatch
based on the renamed file.https://github.com/MetPX/sarracenia/blob/73fdf9f8bac1844534312c5c625312178e1ea273/sarracenia/flow/__init__.py#L1032-L1035
rename: new-filename
inside of the winnow log did confuse me. I spent a bunch of time futzing around with theaccept
statement of the sarra component before realizing that the filename doesn't change when arriving to the sarra so another improvement could be improving the logging to avoid this confusion.