Wittline / csv-schema-inference

A tool to automatically infer columns data types in .csv files
https://wittline.github.io/csv-schema-inference/
MIT License
33 stars 4 forks source link

Allowing different multiprocessing engines #37

Open orellabac opened 1 year ago

orellabac commented 1 year ago

**Is your feature request related to a problem? Please describe I hace environments were I cannot use this library because i cannot leverage multiprocessing only threading.

Describe the solution you'd like Using a backed that allow multiple backends will be great

Describe alternatives you've considered Joblib

sfc-gh-mrojas commented 1 year ago

Hi @Wittline do you think it will possible to consider this change ?

Wittline commented 1 year ago

Hi @sfc-gh-mrojas could you please provide more technical details? I did not see your requests before, I am not receiving notifications about new issues.

sfc-gh-mrojas commented 1 year ago

Sure. Currently the code depends on the multiprocessing lib. The problem is that in some environments I cannot spawn new processes. I think there is PR using job lib that way the backend is configurable and it allows several scenarios. We would like to allow that. What are your thoughts?

Wittline commented 1 year ago

Hi @sfc-gh-mrojas @orellabac, have you had a chance to test the performance of the code? If so, could you please share some details about the performance with different sizes of CSV files?

I would be interested in knowing the processing time and memory consumption for files of varying sizes. It would also be helpful to understand if there were any particular bottlenecks or challenges that you encountered during your testing.

Any additional insights you can provide on the performance of the code would be greatly appreciated. Thank you!