apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.76k stars 4.21k forks source link

[Task]: Update python code and doc strings that uses GCP dependencies #25625

Open AnandInguva opened 1 year ago

AnandInguva commented 1 year ago

What needs to happen?

Update python code and doc strings for the modules that uses GCP dependencies. Seems like they use some old conventions and may require some refactoring to keep them up to date.

For example,

  1. https://github.com/apache/beam/tree/master/sdks/python/apache_beam/io/gcp/datastore/v1new
  2. https://github.com/apache/beam/tree/master/sdks/python/apache_beam/ml/gcp/

Issue Priority

Priority: 3 (nice-to-have improvement)

Issue Components

tvalentyn commented 1 year ago

Can you give an example of what exactly needs to be updated?

AnandInguva commented 1 year ago

Can you give an example of what exactly needs to be updated?

https://github.com/apache/beam/blob/01cb7e9618db63c79cf1887d43d5b2ec50dc1736/sdks/python/apache_beam/ml/gcp/naturallanguageml.py#L43

https://github.com/apache/beam/blob/01cb7e9618db63c79cf1887d43d5b2ec50dc1736/sdks/python/apache_beam/ml/gcp/naturallanguageml.py#L84

We can update doc strings and add type annotations in the format (pcoll: beam.pvalue.PCollection) instead of pcoll, # type: beam.pvalue.PCollection.