dereneaton / ipyrad

Interactive assembly and analysis of RAD-seq data sets
http://ipyrad.readthedocs.io
GNU General Public License v3.0
70 stars 39 forks source link

Branching and removing samples from an assembly with pops file causes a race condition #430

Closed isaacovercast closed 3 years ago

isaacovercast commented 3 years ago

This is an annoying bug and i can't believe it hasn't come up yet. Steps to reproduce:

Took me a while to figure out wtf is going on here. Essentially upon copying the branch the new assembly receives the pop_assign_file parameter from the old one. Now, when you try to do anything -r or -s 7 you get the error message because the new assembly .json file is loaded, and during the load procedure all the "current" assembly parameters are read in and set on the new assembly, before the new assembly parameters are evaluated from the params file. So here the old pops file is set, during which _link_populations() is called which fails because it is reading in the list of samples from the old assembly, some of which are not in the new assembly.

How to fix? Difficult to specifically catch and handle this error inside load_json. Could add a flag for the pop_assign_file setter to tell whether or not to _link_populations, but that would mess up the Params() setter architecture. I'm working on this rn because it's bugging me, but if you have ideas let me know

isaacovercast commented 3 years ago

Here's a hackish idea. Inside load_json if we assume that the .json file written out by ipyrad previously was well formatted and all the parameters checked out from the previous run, then we can wrap the calls to set param in a try/catch and silently ignore:

        # set param in new null assembly with value from old assembly.
        null.set_params(param, value)

I do NOT like this idea, but it's the best I've got so far.

isaacovercast commented 3 years ago

Here's my proposed fix. IFF you are using the cli load_json() will get called and then all the params will be read in from the params-file to update the json. In this case inside load_json we can bypass the set_params call for pop_assign_file, which will skip the troublesome call to _link_populations(), which is guaranteed to be called after load_json returns.

-        null.set_params(param, value)           
+        if cli and param == "pop_assign_file":
+            null.params._pop_assign_file = value
+        else:
+            null.set_params(param, value)

I'm happy with this.