Closed mferrari0 closed 5 months ago
hey, I can't reproduce, can you try with .filter('text is not null') :
(spark.table("raw_documentation")
.filter('text is not null')
.withColumn('content', F.explode(parse_and_split('text')))
.withColumn('embedding', get_embedding('content'))
.drop("text")
.write.mode('overwrite').saveAsTable("databricks_documentation"))
display(spark.table("databricks_documentation"))
Thanks @QuentinAmbard. However, it did't solve the bug because there was another reason: since model serving is not available in my region, I had to create my own serving endpoint. The one I created was powered by a CPU with "Small" selected for the setting "Compute Scaleout". This turned out to be not enough, causing the timeout error I showed above. I changed from CPU to GPU and it ran without problems.
Running the following cell in the notebook mentioned above:
results in an error:![Screenshot 2023-12-08 163213](https://github.com/databricks-demos/dbdemos/assets/33093184/5f74f56d-11df-43ff-8fff-c8d9e2e7f484)
Has anybody experienced this?