MasaakiU / MultiplexNanopore

Other
10 stars 2 forks source link

Problem encountered in Step 3.5_Set threshold for assignment #2

Closed ShiroYark closed 1 year ago

ShiroYark commented 1 year ago

Hi Masaaki,

I am trying to use SAVEMONEY to separate four samples from the sequencing reads. Step 3.4 execute the alignment looks good, however, when the program went to the next step, it returned some errors and forbade me to process further. Below is the error details:

normalizing scores... normalization: DONE drawing figures...

ValueError Traceback (most recent call last) in <cell line: 1170>() 1168 return alignment_result 1169 -> 1170 alignment_result = set_threshold_for_assignment(result_dict, my_aligner, param_dict)

12 frames in set_threshold_for_assignment(result_dict, my_aligner, param_dict) 1164 print("drawing figures...") 1165 draw_distributions(score_summary_df, my_aligner.combined_fastq) -> 1166 draw_alignment_score_scatter(score_summary_df, alignment_result.score_threshold) 1167 1168 return alignment_result

in draw_alignment_score_scatter(score_summary_df, score_threshold) 1074 hist_params = dict( 1075 x=[ -> 1076 score_summary_df.query("(assigned_refseq_idx == @refseq_idx1)&(assigned == 1)")[refseq_name1], 1077 score_summary_df.query("(assigned_refseq_idx != @refseq_idx1)&(assigned == 1)")[refseq_name1], 1078 score_summary_df.query("(assigned == 0)")[refseq_name1]

/usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py in wrapper(*args, *kwargs) 329 stacklevel=find_stack_level(), 330 ) --> 331 return func(args, **kwargs) 332 333 # error: "Callable[[VarArg(Any), KwArg(Any)], Any]" has no

/usr/local/lib/python3.10/dist-packages/pandas/core/frame.py in query(self, expr, inplace, kwargs) 4472 kwargs["level"] = kwargs.pop("level", 0) + 2 4473 kwargs["target"] = None -> 4474 res = self.eval(expr, kwargs) 4475 4476 try:

/usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py in wrapper(*args, *kwargs) 329 stacklevel=find_stack_level(), 330 ) --> 331 return func(args, **kwargs) 332 333 # error: "Callable[[VarArg(Any), KwArg(Any)], Any]" has no

/usr/local/lib/python3.10/dist-packages/pandas/core/frame.py in eval(self, expr, inplace, kwargs) 4610 kwargs["resolvers"] = tuple(kwargs.get("resolvers", ())) + resolvers 4611 -> 4612 return _eval(expr, inplace=inplace, kwargs) 4613 4614 def select_dtypes(self, include=None, exclude=None) -> DataFrame:

/usr/local/lib/python3.10/dist-packages/pandas/core/computation/eval.py in eval(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target, inplace) 356 eng = ENGINES[engine] 357 eng_inst = eng(parsed_expr) --> 358 ret = eng_inst.evaluate() 359 360 if parsed_expr.assigner is None:

/usr/local/lib/python3.10/dist-packages/pandas/core/computation/engines.py in evaluate(self) 79 80 # make sure no names in resolvers and locals/globals clash ---> 81 res = self._evaluate() 82 return reconstruct_object( 83 self.result_type, res, self.aligned_axes, self.expr.terms.return_type

/usr/local/lib/python3.10/dist-packages/pandas/core/computation/engines.py in _evaluate(self) 120 scope = env.full_scope 121 _check_ne_builtin_clash(self.expr) --> 122 return ne.evaluate(s, local_dict=scope) 123 124

/usr/local/lib/python3.10/dist-packages/numexpr/necompiler.py in evaluate(ex, local_dict, global_dict, out, order, casting, _frame_depth, **kwargs) 941 return re_evaluate(local_dict=local_dict, _frame_depth=_frame_depth) 942 else: --> 943 raise e 944 945 def re_evaluate(local_dict: Optional[Dict] = None,

/usr/local/lib/python3.10/dist-packages/numexpr/necompiler.py in validate(ex, local_dict, global_dict, out, order, casting, _frame_depth, **kwargs) 849 expr_key = (ex, tuple(sorted(context.items()))) 850 if expr_key not in _names_cache: --> 851 _names_cache[expr_key] = getExprNames(ex, context) 852 names, ex_uses_vml = _names_cache[expr_key] 853 arguments = getArguments(names, local_dict, global_dict, _frame_depth=_frame_depth)

/usr/local/lib/python3.10/dist-packages/numexpr/necompiler.py in getExprNames(text, context) 712 713 def getExprNames(text, context): --> 714 ex = stringToExpression(text, {}, context) 715 ast = expressionToAST(ex) 716 input_order = getInputOrder(ast, None)

/usr/local/lib/python3.10/dist-packages/numexpr/necompiler.py in stringToExpression(s, types, context) 272 no_whitespace = re.sub(r'\s+', '', s) 273 if _forbidden_re.search(no_whitespace) is not None: --> 274 raise ValueError(f'Expression {s} has forbidden control characters.') 275 276 old_ctx = expressions._context.get_current_context()

ValueError: Expression ((assigned_refseq_idx) == (__pd_eval_local_refseq_idx1)) & ((assigned) == (1)) has forbidden control characters. image image

Btw, I ran this program on google colab. I also attached the plasmid maps (4 files of *.dna) and sequencing raw data here. data.zip I appreciate your efforts to build this cool program and hope that you can help me to go through this problem.

MasaakiU commented 1 year ago

Hello @ShiroYark ,

Thank you for the bug report. Now the bug was fixed, please try again. I cannot attach the results because of the file size, but if you upload the following intermediate results together with your plasmid maps and sequencing raw data on Google Colab, the process will be done much faster. Ding_p7q_4_pBAD_Aga1_Ago_Cas5fused.intermediate_results.txt

It seems to be caused by the new release of numexpr (#54449), one of the modules used in the script and unrelated to the SAVEMONEY algorithm.

Let me know if you have any other question / trouble.