Closed LuisSanchez25 closed 1 year ago
Hey Yossi, I made the code with the preprocessor decorator (altough I dont think this was actually necessary since there was already inbuilt code to the URLConfig class to handle this). I tried following he code on how context configs are set and at no point that I could find was the class URLConfig ever called. I dealt with the issue in the apply_xedocs_config by including a function that reorders the URLs before calling the set_config function, at which point I believe the hash is made.
@LuisSanchez25 short question how does then the URL config in the lineage look like?
st.key_for(run_id, data_type).lineage
Will it be alphabetically sported or as provided as config to the plugin?
@WenzDaniel The preprocessors are run by URLConfig at the plugin initialization stage before setting plugin.config
, so the URL in the lineage will always be sorted.
I am afraid I have not followed the issue enough, but I am worried a bit that this could lead to confusion. What would happen if I pass the sorted config string again as a config in straxen? Would this work? Will we confuse analysts with having a different config in the lineage with respect to the plugin code?
Right, but on the other hand this is nothing new right? The context already performs modifications on the config of the plugin prior to setting it, such as using the default value if it's not set or taking the value from a dict based on the run I'd if the defaults by run I'd is enabled. This just adds to that logic... The change to the url is rather minor since it just changes the order of the parameters at the end of the url. That being said I was not a strong advocate of making all equivalent urls resolve to the same hash, I just mentioned that is easy to implement using the preprocessor framework. But it did seem to be the preferred option by most in the team a meeting we had about it
Right, but on the other hand this is nothing new right? The context already performs modifications on the config of the plugin prior to setting it, such as using the default value if it's not set or taking the value from a dict based on the run I'd if the defaults by run I'd is enabled. This just adds to that logic...
I see, my impression is just that this is a bit more transparent, but also might be just personal bias.
That being said I was not a strong advocate of making all equivalent urls resolve to the same hash, I just mentioned that is easy to implement using the preprocessor framework. But it did seem to be the preferred option by most in the team a meeting we had about it
I also think that it makes a lot of sense and that the hash should depend on the data and not the order of the config. I am just wondering if this needs more communication with the users. E.g. force all default settings to be sorted, and warn the user in case a non-sorted setting is used, that it is going to be sorted in the hash. But might also be over engineering.
If people are happy as it is I am also fine, I have no strong opinions either way.
A warning would be nice but it would probably have to be a single global warning otherwise we will have tens of warnings every time someone processes any data. This makes the implementation of the warning a bit more complex.
I think we can put a warning for maybe this version informing people about this change and to use older version of straxen if the want to access data they already processed with their URLs. We can then remove the warning in a version or two from now when people become familiar with this. Can it be a warning that just shows up when you first import straxen?
@jmosbacher @WenzDaniel I added a warning to the init file which should only be displayed when you first import the straxen file. Let me know if I should change the message in any way or if there are any concerns with the way I implemented it.
Thanks @LuisSanchez25. Two little things I doubt that the next release will be 2.0.8, but more like 2.1.0. I think there were quite some changes to reason a major bump, but maybe @dachengx can comment. Maybe add a final sentence like, that nothing changes in the usage of URL configs or how one overwrites a default setting.
Thanks @LuisSanchez25. Two little things I doubt that the next release will be 2.0.8, but more like 2.1.0. I think there were quite some changes to reason a major bump, but maybe @dachengx can comment. Maybe add a final sentence like, that nothing changes in the usage of URL configs or how one overwrites a default setting.
Thanks, @LuisSanchez25 . I agree that the next release should be at least 2.1.0
Can you only raise a warning if a URL was affected by the preprocessor? So can we move the warning logic to the preprocessor function and raise the error if both the URL was changed and warning has not yet been raised
From my understanding the warning function will only appear once when it is called, I have checked this by trying to invoke it multiple times so this setup should work. I believe this is ready to merge.
Can we change this to:
From straxen version 2.1.0 onward, URLConfig parameters will be sorted alphabetically before being passed to the plugins, this will change the lineage hash for non-sorted URLs. To load data processed with non-sorted URLs, you will need to use an older version.
Ah yes you are correct, I will change it to using the testing context
Hey, @jmosbacher. Is this good enough? It looks good to me, especially the URLConfig.preprocessor
and warning.
@dachengx there are still two open changes i requested:
@LuisSanchez25 For the 2nd request, I guess you can try https://github.com/XENONnT/straxen/blob/8da4b6fe28c3bcbbee6a68b11fd411cb98e95ace/tests/plugins/_core.py#L68.
@LuisSanchez25 please try to run the tests locally before pushing to github. We only have 3000 cpu-minutes every month of github actions until it starts costing money and these tests can take a lot of resources
@LuisSanchez25 @jmosbacher Thanks!
Before you submit this PR: make sure to put all operations-related information in a wiki-note, a PR should be about code and is publicly accessible
What does the code in this PR do / what does it improve?
It aims to ensure URLConfigs with the same strings but in different order gives you the same hash by alphabetizing the arguments as a preprocess for the URLConfig class.
Can you briefly describe how it works?
Before the URL is hashed, a preprocessing function for the URLConfig class is run to alphabetize the order of the queries in the URL
Can you give a minimal working example (or illustrate with a figure)?
This is also included on the tests: st1 = xenonnt_online() st2 = xenonnt_online()
Same URLs but the queries are in a different order
st1.apply_xedocs_configs(version = 'v_test1', db='development_db') st2.apply_xedocs_configs(version = 'v_test2', db='development_db') assert st1.key_for(25000, 'corrected_areas').lineage_hash == st2.key_for(25000, 'corrected_areas').lineage_hash
Please include the following if applicable:
Notes on testing
All italic comments can be removed from this template.