dmmiller612 / sparktorch

Train and run Pytorch models on Apache Spark.
MIT License
339 stars 44 forks source link

spark.rdd API #34

Open Safary1094 opened 1 year ago

Safary1094 commented 1 year ago

Hi! thank you for this nice package! I have a question: why do we use custom mapPartitionsWithIndex at https://github.com/dmmiller612/sparktorch/blob/7e30743ed76abf10f7c3d3db29a1ca83441fa400/sparktorch/distributed.py#L53 instead of built-in spark https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.RDD.mapPartitionsWithIndex.html ?

dmmiller612 commented 1 year ago

It's been a long time, but I don't think that function was there when I first wrote the library a few years ago in pytorch. I think they had recently implemented it but wasn't in pyspark 2.0 at the time, so I put it there for backwards compatibility.