Closed zelima closed 5 years ago
Are you using the speedup version (the one using leveldb)?
On Tue, Jan 29, 2019, 07:35 Irakli Mchedlishvili notifications@github.com wrote:
I'm trying to solve this exercise https://github.com/ViderumGlobal/programming-exercise but join needs so big time to process that I thought it just hang and could not finish the task. Don't see any while loops in join.py so I doubt I'm getting in an infinite loop, making me think that it's just slow.
I simplified the code
from dataflows import Flow, load, join, printer, filter_rows,
def filter_over_10(rows): for row in rows: if row.get('order') is not None and row.get('order') > 10: continue yield row
res = Flow( load('data/movies/datapackage.json'), load('data/credits/datapackage.json'), filter_over_10, filter_rows(not_equals=[{'revenue': 0}], resources=['tmdb_5000_movies']), filter_rows(not_equals=[{'gender': 0}], resources=['tmdb_5000_credits']), join('tmdb_5000_movies', ['id'], 'tmdb_5000_credits', ['id'], fields={'revenue':{}}, full=False), printer(), ).results()
- movies is ~4000 rows
- credits ~40000 after the filter
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/datahq/dataflows/issues/66, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQMdRU9__1sqpGHOR4CyyJqOqkwBra-ks5vH92QgaJpZM4aXVFi .
speedup version?
On Tue, Jan 29, 2019 at 8:27 AM Irakli Mchedlishvili < notifications@github.com> wrote:
speedup version?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/datahq/dataflows/issues/66#issuecomment-458422593, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQMdZZBg_TRaeNLm5VUG21ZQXijRTn1ks5vH-nEgaJpZM4aXVFi .
That's a lot faster
:D
On Tue, Jan 29, 2019 at 11:31 AM Irakli Mchedlishvili < notifications@github.com> wrote:
That a lot faster
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/datahq/dataflows/issues/66#issuecomment-458470082, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQMdStdvRcJC2NsrAC1SDlBEn_OLRVAks5vIBUNgaJpZM4aXVFi .
I'm trying to solve this exercise https://github.com/ViderumGlobal/programming-exercise but join needs so big time to process that I thought it just hang and could not finish the task. Don't see any while loops in join.py so I doubt I'm getting in an infinite loop, making me think that it's just slow.
I simplified the code