I have been a GCP user for the past 6 months and I would like to take this opportunity to report my agony. PLEASE DO NOT FOOL DEVELOPERS WITH FALSE EXAMPLES!
Google doesn't provide supported spark driver for neither pubsub not datastore. Its a shame. Even worse is the following lines of code:
def saveRDDtoDataStore(tags: Array[Popularity], windowLength: Int): Unit
Please read the function name "saveRDD", and you are accepting an array. This is called cheating.
Even worse:
sortedHashtags.foreachRDD(rdd => {
handler(rdd.take(n)) //take top N hashtags and save to external source
})
Do you know the consequences of using take? Are you a spark developer?
I had to go great lengths to ensure I don't Ack (pubsub) before I process my records. I had to resort to sub-optimal plan-B (broadcast variables) when datastore driver didn't support stream-join.
Its a fact that you want to capture your big-client by forcing them to use propitiatory software like grpc, cloud-data flow by not providing proper drivers for spark. Why beat around the bush?
What a shame! remember "DON'T BE EVIL?" This is evil.
Dear google,
I have been a GCP user for the past 6 months and I would like to take this opportunity to report my agony. PLEASE DO NOT FOOL DEVELOPERS WITH FALSE EXAMPLES! Google doesn't provide supported spark driver for neither pubsub not datastore. Its a shame. Even worse is the following lines of code:
Please read the function name "saveRDD", and you are accepting an array. This is called cheating.
Even worse:
Do you know the consequences of using take? Are you a spark developer?
I had to go great lengths to ensure I don't Ack (pubsub) before I process my records. I had to resort to sub-optimal plan-B (broadcast variables) when datastore driver didn't support stream-join.
Its a fact that you want to capture your big-client by forcing them to use propitiatory software like grpc, cloud-data flow by not providing proper drivers for spark. Why beat around the bush?
What a shame! remember "DON'T BE EVIL?" This is evil.