MemoryError workaround - Githubissues

cdqa-suite / cdQA

⛔ [NOT MAINTAINED] An End-To-End Closed Domain Question Answering System.

Apache License 2.0

614 stars 191 forks source link

Kindly consider changing the def _expand_paragraphs function in the cdqa_sklearn.py file to accommodate larger datasets. Modifying the dataframe needs a lot of memory for bigger data so it would be better to set it as a list of dict before making it a dataframe.

Below is the modification I did so I would not get a MemoryError:

@staticmethod
   def _expand_paragraphs(df): 
        data=[]
        for n in range(len(df)):  
            stringlist = df.iloc[n][1]  
            for m in range(len(stringlist)): 
                a=df.iloc[n][0] 
                b=stringlist[m] 
                data.append({'title' : a, 'content' : b}) 
        dfx = pd.DataFrame(data) 
        return dfx

cdqa-suite / cdQA

MemoryError workaround #357