Closed rcpeene closed 3 months ago
@rcpeene sorry I missed this.
I just removed the lazyframe stuff, it's not necessary. But it looks like doing a lazy scan of a csv from S3 is working now, which is nice.
For completeness, the reason it failed is you can't get a column from a lazyframe without first collecting results:
df.collect()['last_movement_dt']
would have fixed it
In OpenScope's metadata/upload code, loading newscale coordinates fails with lazy loading. Perhaps due to a versioning error, I'm not quite sure. For my purposes, simply removing the lazy loading sufficed as a temporary solution.
It occurs around here (line 126 in newscale.py):
I'm not sure what a robust solution is or what operations are/are not allowed on a lazy dataframe under these conditions. Perhaps wrapping the subsequent access of
df['last_movement_dt']
into thetry...except
?The traceback: