Azure / azure-cosmosdb-spark

Apache Spark Connector for Azure Cosmos DB
MIT License
202 stars 121 forks source link

Fix streaming checkpoint issues #392

Closed revinjchalil closed 4 years ago

revinjchalil commented 4 years ago

Issues:

  1. Streaming currently does not make progress if there is no existing checkpoint file for the given partition/s. This issue typically happens at the very beginning of the stream job in the cases when the "ChangeFeedStartFromTheBeginning" is set to false.

  2. The Read checkpoint path given with the ADLS (abfss://) or Blob (wasb://) path are not recognized and the checkpoint files are written to the Default FS. For example, on databricks the Read checkpoint files are written to "dbfs:///" when an ADLS or Blob path if provided because that is the "fs.defaultFS" hadoop config.

Changes:

  1. set the next continuation token with the getResponseContinuation from feedResponse when the given partition does not have existing token

  2. Use the new URI form for hadoop fs creation by passing the CheckpointLocation