cdqa-suite / cdQA

⛔ [NOT MAINTAINED] An End-To-End Closed Domain Question Answering System.
https://cdqa-suite.github.io/cdQA-website/
Apache License 2.0
616 stars 191 forks source link

numpy core fromnumeric.py error in QAPipeline.fit_retriever #356

Open riemann85 opened 4 years ago

riemann85 commented 4 years ago

Describe the bug Replication of a QAPipeline as in your example in fit_retriever() related to numpy.core.fromnumeric

To Reproduce Steps to reproduce the behavior: tutorial-use-pdf-converter.ipynb

  1. Go to '...' tutorial-use-pdf-converter.ipynb cdqa_pipeline = QAPipeline(reader='./models/bert_qa.joblib', max_df=1.0)

Fit Retriever to documents

cdqa_pipeline.fit_retriever(df=df) cdqa_pipeline = QAPipeline(reader='./models/bert_qa.joblib', max_df=1.0)

Fit Retriever to documents

cdqa_pipeline.fit_retriever(df=df)

Screenshots ValueError Traceback (most recent call last)

in 1 cdqa_pipeline = QAPipeline(reader='./models/bert_qa.joblib') ----> 2 cdqa_pipeline.fit_retriever(df=df) /mnt/batch/tasks/shared/LS_root/mounts/clusters/ds-gen-gpu-v100/code/Users/malosett/pillar_clauses/cdQA/cdqa/pipeline/cdqa_sklearn.py in fit_retriever(self, df) 109 ) 110 else: --> 111 self.metadata = self._expand_paragraphs(df) 112 113 self.retriever.fit(self.metadata) /mnt/batch/tasks/shared/LS_root/mounts/clusters/ds-gen-gpu-v100/code/Users/malosett/pillar_clauses/cdQA/cdqa/pipeline/cdqa_sklearn.py in _expand_paragraphs(df) 230 { 231 col: np.repeat(df[col].values, df[lst_col].str.len()) --> 232 for col in df.columns.drop(lst_col) 233 } 234 ).assign(**{lst_col: np.concatenate(df[lst_col].values)})[df.columns] /mnt/batch/tasks/shared/LS_root/mounts/clusters/ds-gen-gpu-v100/code/Users/malosett/pillar_clauses/cdQA/cdqa/pipeline/cdqa_sklearn.py in (.0) 230 { 231 col: np.repeat(df[col].values, df[lst_col].str.len()) --> 232 for col in df.columns.drop(lst_col) 233 } 234 ).assign(**{lst_col: np.concatenate(df[lst_col].values)})[df.columns] <__array_function__ internals> in repeat(*args, **kwargs) /anaconda/envs/azureml_py36/lib/python3.6/site-packages/numpy/core/fromnumeric.py in repeat(a, repeats, axis) 479 [3, 4]]) 480 --> 481 """ 482 return _wrapfunc(a, 'repeat', repeats, axis=axis) 483 /anaconda/envs/azureml_py36/lib/python3.6/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds) 59 60 try: ---> 61 return bound(*args, **kwds) 62 except TypeError: 63 # A TypeError occurs if the object does have such a method in its ValueError: repeats may not contain negative values. **Desktop (please complete the following information):** Execute notebook examples on Azure ML with V100 GPU. **Additional context** What is the requirement for numpy version I have installed 1.18.2 numpy version All other requirements met as in requirements.txt
riemann85 commented 4 years ago

Hi, I analyzed the issue and the problem consists in the dataframe format in input to fit_retriever() method. fit_retirever() QAPipeline works fine for df of a format like bnp one. May I ask which is the format for df dataframe (a dataframe with title , paragraphs columns)