Wikidata / soweego

Link Wikidata items to large catalogs
https://meta.wikimedia.org/wiki/Grants:Project/Hjfocs/soweego_2
GNU General Public License v3.0
97 stars 9 forks source link

Prevent out of memory error #317

Closed tupini07 closed 5 years ago

tupini07 commented 5 years ago

This PR removes the arrays containing pandas.dataframes that result after processing each chunk. They have now been replaced with a single dataframe to which we add the results. This change has been done to both train and classify.

The reason for this is that having many small dataframe instances takes up a lot of 'overhead memory'. Using only one dataframe drastically reduces the memory usage.

closes #274

tupini07 commented 5 years ago

Thanks for this crucial PR! Just a couple of code suggestions.

Thanks @marfox . You're right, better to be safe 😄 Suggestions have been applied!