Open ivandiaz-tomtom opened 2 years ago
For reference, it works fine without using the personalized model.
with mlflow.start_run(run_name='GEOSCAN') as run:
geoscan = Geoscan() \
.setLatitudeCol('lat') \
.setLongitudeCol('lon') \
.setPredictionCol('cluster') \
.setEpsilon(20) \
.setMinPts(3)
mlflow.log_param('epsilon', 20)
mlflow.log_param('minPts', 3)
model = geoscan.fit(df)
mlflow.spark.log_model(model, "geoscan")
run_id = run.info.run_id
I also have this problem
Personalized model runs clustering for each group in memory.
I suspect some of your groups (e.g. a user) may have too much data to be used in memory. You could run same grouping and get simple statistics to see if specific groups are over represented, possibly treating those as separate process
Hi,
I was wondering if you could help with a problem I am getting when running geoscan in a DBR 10.4 LTS cluster. After creating the dataframe with
latitude
, andlongitude
columns and trying to run a personalized geoscan, the cluster gets stuck on pending stage (in my case 62 tasks, description below). Are there any dependencies that can cause this? Unfortunately, there is no logging in the cluster than can help me track the root cause.Thanks, Ivan
Code for creating the model
Task pending execution