ResidentMario / checkpoints

Partial result caching for pandas in Python.
18 stars 3 forks source link

Pandas Exclusive? #1

Open hydrosquall opened 7 years ago

hydrosquall commented 7 years ago

Thanks for writing this library! When writing scrapers, sometimes a URL in the middle fails because the website needs to have a captcha or something answered before I can continue scraping. The state machine works beautifully, and saved me the trouble of cluttering up my code with defensive measures as I have had to do the past.

I was wondering about the decision to write the safe methods specifically for Series and DataFrame object. It's relatively straightforward to wrap some other iterable (i.e. list, set, etc) in a pd.Series().

Would it be possible to extend the standard python map function to have this safe feature, in the event that pandas wasn't available on the computer the script was being run on?

ResidentMario commented 7 years ago

Yes, it should be possible; after all a pandas Series is just a fancy iterable.

Would I implement that? Well...I use this little shim every so often in my own work; you're the first other person I'm aware of using it. For my own purposes I've never needed to checkpoint a pure iterable, so I never implemented it.

A maybe better direction to go in would be to integrate this tool into a more popular one (see engarde#33). But, we'll see. I'll keep this open in the meantime.

ResidentMario commented 7 years ago

Also, this is a small design consideration but you should almost never overwrite an internal Python name like map, and certainly not here. This should probably be an importable safe_map function or something like that instead.