GoogleCloudPlatform / python-docs-samples

Code samples used on cloud.google.com
Apache License 2.0
7.44k stars 6.42k forks source link

Dataflow job gets stuck with ReadFromText() #4939

Closed sneko closed 3 years ago

sneko commented 3 years ago

Hi,

Since my previous issue https://github.com/GoogleCloudPlatform/python-docs-samples/issues/4891 I didn't take time to test my pipeline... but today I was able to do it and it gets stuck at a ReadFromText():

    lines = p | 'ReadFile' >> beam.io.ReadFromText(calculation_options.input_file.get(), skip_header_lines=0)

There is absolutely no meaningful logs in the console, that's really hard to understand what's going on. From what I understand reading a file provides a bounded collection so there is no reason for the reader to be stuck, right?

image (the log you see about "Read transactions" is a debug log in my code outside pipeline starting)

Note that locally it works perfectly (with the exact same file into its Google Storage bucket)!

I thought about an issue with permission but I also tried be specifying a custom service account with full permissions over Google Storage but nothing has changed.

In addition to solving this, I'm interested to understand what's the reason Dataflow is not really verbose... I always had differences between local testing and Dataflow testing, it's really hard to ramp up on this product šŸ˜¢

Thank you,

Environment:

EDIT: note that in the Dataflow interface there is absolutely no "block" as a schema. It says "The graph is unavailable". Maybe my pipeline has an issue? Will give another try tomorrow.

sneko commented 3 years ago

I tried to do the simplest pipeline possible but it's still not working. I would appreciate if someone from Google could provide a flex template that works as batch?

[EDIT: after search again and again I'm starting to wonder if Flex templates are not only targetting streaming pipelines?]

I well understood ".wait_until_finish()" must be avoided in the Flex templates. Moreover I didn't see much differences with my below example and all your examples:

My test:

import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
import argparse

def run(argv=None):
  """Build and run the pipeline."""
  parser = argparse.ArgumentParser()
  known_args, pipeline_args = parser.parse_known_args()
  options = PipelineOptions(pipeline_args, save_main_session=True, streaming=False)

  with beam.Pipeline(options=options) as pipeline:
    samples = [
      'aaaaaaa',
      'bbbbbbb',
    ]

    messages = (
        pipeline
        | 'ReadTransactionsFile' >> beam.Create(samples)
      )

if __name__ == '__main__':
  print('starting')
  run()

And the logs I get: image

All my tests with different kind of tries stay for 12 minutes and after they fail (due to the timeout in polling result). That's really hard, I spent days on this thing and I'm just lost... I really hope that's on my hand haha

Thank you,

cc @tmatsuo @davidcavazos

EDIT2: I tried to downgrade to Beam v2.23 just in case but it does not work neither EDIT3: should we really use the latest tag for the base image of Flex template? I saw a stable tag from May 2019... https://console.cloud.google.com/gcr/images/dataflow-templates-base/GLOBAL/python3-template-launcher-base

davidcavazos commented 3 years ago

I wander if this issue is somehow related to #4894

sneko commented 3 years ago

@davidcavazos I saw this one but I have no error about opening the template file. According to my logs this part seems to work fine since the Python script has started, no?

Does any of you at Google have a Flex template for batch as sample? Or at least someone who could confirm basic things are working with this technology? It would be great... turning around for too long, starting to think about going back to classic templates and manage multiple CI/CD pipelines to have all variants of the graph I want depending on input parameters.

Thank you in advance,

davidcavazos commented 3 years ago

I'm not able to reproduce the error, I tried running the same job and it's working for me.

Are you installing libffi-dev like in the sample Dockerfile?

sneko commented 3 years ago

Yes that's what I was doing for Beam v2.24 but since I was stuck I tried Beam v2.23 without it and that's the same issue...

Just to be sure:

Like that I will be able tomorrow to really mimic what you've done and tell you if it works or not on my hand. Because there are so many things that can mess up šŸ˜„ ...

Thanks,

EDIT: also, if such example for batch works, you should for sure update your Python Dataflow examples to bring it. For now just streaming flex examples are provided... can make people doubting about if it's possible or not.

davidcavazos commented 3 years ago

beam_pipeline.py

import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
import argparse
import logging

def run(argv=None):
    """Build and run the pipeline."""
    parser = argparse.ArgumentParser()
    args, pipeline_args = parser.parse_known_args()
    options = PipelineOptions(
        pipeline_args, save_main_session=True, streaming=False)

    with beam.Pipeline(options=options) as pipeline:
        samples = [
            'aaaaaaa',
            'bbbbbbb',
        ]

        messages = (
            pipeline
            | 'ReadTransactionsFile' >> beam.Create(samples)
            | beam.Map(logging.warning)
        )

if __name__ == '__main__':
    logging.warning('starting')
    run()

Dockerfile

FROM gcr.io/dataflow-templates-base/python3-template-launcher-base

ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}

COPY beam_pipeline.py .
COPY requirements.txt .

RUN apt-get update && apt-get install libffi-dev

ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE="${WORKDIR}/requirements.txt"
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/beam_pipeline.py"

RUN pip install -U -r ./requirements.txt

metadata.json

{
    "name": "Batch template",
    "description": "Flex template experiment.",
    "parameters": []
}

requirements.txt

apache-beam[gcp]

Commands

export IMAGE=gcr.io/$PROJECT/samples/dataflow/batch-job:latest
export TEMPLATE_PATH="gs://$BUCKET/samples/dataflow/templates/batch-job.json"
export REGION="us-central1"

# Build image
gcloud --project $PROJECT builds submit --tag $IMAGE .

# Build template
gcloud --project $PROJECT dataflow flex-template build $TEMPLATE_PATH \
    --image "$IMAGE" \
    --sdk-language "PYTHON" \
    --metadata-file "metadata.json"

# Run template
gcloud --project $PROJECT dataflow flex-template run "batch-job-`date +%Y%m%d-%H%M%S`" \
    --template-file-gcs-location "$TEMPLATE_PATH" \
    --region "$REGION"
davidcavazos commented 3 years ago

However, I have noticed that the job took 13 minutes to complete. That seems like a rather long time, especially considering that launching it directly without templates took 4 minutes.

davidcavazos commented 3 years ago

@sneko, can you look into the worker logs? A timeout could mean the worker stopped unexpectedly, maybe the worker logs might have something telling us what happened.

@MelodyShen, do you know what's happening during the template initialization that makes the job take that long? Do you think that might have caused a 25m delay to trigger the timeout?

sneko commented 3 years ago

especially considering that launching it directly without templates took 4 minutes.

@davidcavazos I'm not sure to understand "launching without template", how do you proceed to do so?

(I will give a new try this afternoon and come back to you with worker logs. Just to be sure, you talk about the one I can see into the Dataflow log section? Or do you mean I need to go somewhere else to really see what's going on on underlying dedicated server (GCE)?)

sneko commented 3 years ago

@davidcavazos

--> As soon as I make my own pipeline working I will update what was happening.

Thank you,

sneko commented 3 years ago

Starting from your sample, I tried to just change the requirements.txt with mine and it doesn't work.

I use poetry as package manager and inside its pyproject.toml I have:

[tool.poetry.dependencies]
apache-beam = {version = "2.23",extras = ["gcp"]}

(I removed others to simulate yours)

but when exporting to requirements.txt it generates a detailed one:

apache-beam==2.23.0 \
    --hash=sha256:c06645a3326fa33b774d5de38d924f10ac0cbbcdab9b67651936f6906f3444b3 \
    --hash=sha256:86816308c2673c89f89fb0fb08da8b3a166bbcc8c767b5f1965cf8bc5c26ca75 \
    --hash=sha256:6d56b4b4a6f2c4c3539b29a3432c15f96c4529b5d2f152d39874750e686d60b5 \
    --hash=sha256:f577750712e223c90d7d6f0794539786f39cfb9ffdfd62fdec38ea7e92cab810 \
    --hash=sha256:ab790f1998ad175dd99b9f69b51dc13bd47ba53a21e8f1706f7f9ac477bbb68e \
    --hash=sha256:086462eed818616845493c33c44484600e066de138692bdfa94dbbf20018d2b1 \
    --hash=sha256:5d87491aa98821403cd7006fe8f6cfc39b7a54ef7701ed050fc1f83f9ccf7770 \
    --hash=sha256:f88f503b94eccf0a1a919153880711405fcadb845052fb27cb6c56ffaeac348a \
    --hash=sha256:61de573ee0b13528f2f775ec9864c2934882f3a04bf893f5d187f0df97e45b2e \
    --hash=sha256:d604d553b9f06a16b536716dfb56bbea3fdd0be4a29569a3c520b74aa3a09b98 \
    --hash=sha256:99d4d8549fdcae7f321f0a7da218307597b33810c332e1cdad6bab38d753e76a \
    --hash=sha256:5d2ebcedffce980438fbb169de6f2edfa1edb3b5d35d60644e1381a98649c3b1 \
    --hash=sha256:ede318ddc0a41500b91cb64b4aa3d14549ce4a94abe83bdb4239e92a99944a12 \
    --hash=sha256:95d59bdd1ebef1b85741835f259f89a3eb40333cfc3b72551053d516d8b386bf \
    --hash=sha256:0cb12b0f3768cdbe2c5b1114d79e0707af9186e5a1dde8915b1b7623d532b7f9 \
    --hash=sha256:419eb02b8e58f319815b21e07dad69ba63a725687e8051f96f8ae5f24b29be30 \
    --hash=sha256:3083aa4d66829804f8238400a7a8e6af6b8fe03d38d178a818e2acb4d39d57df \
    --hash=sha256:6d71829af5742cd1c16b812a3168cf60969c9d6281965f5aa9ca0c269320f505 \
    --hash=sha256:9bdcbde1f0939054acb25bca7e03383047cea7b0b0f467bca22a132cb9054dec \
    --hash=sha256:6d02f6be6c8176b979008f92ab88647100ca38eb67737a31a7a432a907f08972 \
    --hash=sha256:2ce4e48cbfd7db945e9bdf1f363a26cf7352a484d2513c20a059f5b054355f0b \
    --hash=sha256:55fbc31ac5abaa01dc5382f3392ddad5381d715560da62bd3ca73b1ae0edf08d \
    --hash=sha256:a9bb010909f1271d402ee0e77ee30e539adcfbad4bee1a8470cfa0e2afdbdc8d \
    --hash=sha256:d6e93d6f0281317b7c753943ba49558f386a089259267ccabe0a40e786f6587a \
    --hash=sha256:e8684f030a6d98bd20af9f54f20bb0f15074ba2b3389b158b55613959d4b79df \
    --hash=sha256:1bc4824a5e36042788ddac390ae369a42de215ecce1db2a5963d95ef3afb184e \
    --hash=sha256:04a6ad7648872806b3c694afe4acb7ce16255908656a77fa7e6d4c3ed5bea956 \
    --hash=sha256:1879462f0ee1e2dcae54bafcb7914296fbee8569aac5cbfc008aefa588eed45d \
    --hash=sha256:1963744967c5ca4c79ae57046e753bc3594076222dcd97e14852219292bc2dcb \
    --hash=sha256:52a998690db184142a5c0b4002987e754d73ea826ddd986c5e7296d524cc12e0
avro-python3==1.9.2.1; python_version >= "3.0" \
    --hash=sha256:ca1e77a3da5ac98e8833588f71fb2e170b38e34787ee0e04920de0e9470b7d32
cachetools==3.1.1 \
    --hash=sha256:428266a1c0d36dc5aca63a2d7c5942e88c2c898d72139fca0e97fdd2380517ae \
    --hash=sha256:8ea2d3ce97850f31e4a08b0e2b5e6c34997d7216a9d2c98e0f3978630d4da69a
certifi==2020.6.20 \
    --hash=sha256:8fc0819f1f30ba15bdb34cceffb9ef04d99f420f68eb75d901e9560b8749fc41 \
    --hash=sha256:5930595817496dd21bb8dc35dad090f1c2cd0adfaf21204bf6732ca5d8ee34d3
chardet==3.0.4 \
    --hash=sha256:fc323ffcaeaed0e0a02bf4d117757b98aed530d9ed4531e3e15460124c106691 \
    --hash=sha256:84ab92ed1c4d4f16916e05906b6b75a6c0fb5db821cc65e70cbd64a3e2a5eaae
crcmod==1.7 \
    --hash=sha256:dc7051a0db5f2bd48665a990d3ec1cc305a466a77358ca4492826f41f283601e \
    --hash=sha256:69a2e5c6c36d0f096a7beb4cd34e5f882ec5fd232efb710cdb85d4ff196bd52e \
    --hash=sha256:737fb308fa2ce9aed2e29075f0d5980d4a89bfbec48a368c607c5c63b3efb90e \
    --hash=sha256:50586ab48981f11e5b117523d97bb70864a2a1af246cf6e4f5c4a21ef4611cd1
dill==0.3.1.1 \
    --hash=sha256:42d8ef819367516592a825746a18073ced42ca169ab1f5f4044134703e7a049c
docopt==0.6.2 \
    --hash=sha256:49b3a825280bd66b3aa83585ef59c4a8c82f2c8a522dbe754a8bc8d08c85c491
fastavro==0.23.6 \
    --hash=sha256:ff2cc25dd994802c158ade202cb0bdc7d463d4d776975f74e35a6c227d2e9b79 \
    --hash=sha256:bd59bdb832fc1d090dcafc12b436e4313ca2ffaa012d1b269e77c229f4b999d6 \
    --hash=sha256:5138e720143d070a693f187d565a7232c1102f893153e8f229573b9bf248d761 \
    --hash=sha256:32533b2fc97e40de03fd31003131ccde191c156e6cf37ecd1f9a92a86d4a3381 \
    --hash=sha256:eb6fa4ac5698448290738de571b290027ec683be65052fc49108883942803daa \
    --hash=sha256:b18982154039147c5738c0986e0cf6183c8fbe43c31504fb4d92b5c77349befc \
    --hash=sha256:78d805b785470828f3a0175ce120113eca0cfe48238d055e24fead6fe45da443 \
    --hash=sha256:bc586311ba0857ffe832ae43f71df3d5581dbd84ddb9f18d5fed4d250f253849 \
    --hash=sha256:0f1417bdc13531451aaef991290ca348cc9320915c4ed152314d94b353049490 \
    --hash=sha256:7e3b032a339d45bbabc073662ae12a857cf481fe42c65381278d956bf9010b09 \
    --hash=sha256:62141efacafce00d40a5106e298e9d93b07e86385833f9d2f684b8cd6325ee41 \
    --hash=sha256:fc5c3048d38d862075703ec484133289388e6f2b0251e91ce770d3cfbc37ea21 \
    --hash=sha256:bed3642543363d00ddbadd88098678806dd6be7aa02445728250045f774dc15b \
    --hash=sha256:f11a3559e1174fc6a449bb1a42a6c6bbabbe42dddc73a78b0ab711ca40f034a0 \
    --hash=sha256:ec917b529cbb7a1d59b55f5ab471f9c975ef802a3fc7204a3cca0aec695c1ac9 \
    --hash=sha256:33b6b581d7fd0719b62b6a4ecf436f3a9b11f427753d28dd621841e5223e07e5 \
    --hash=sha256:20e83360493cc73abf143f9322217fa9fc9b31971afec5aab9a3d59ee0fd0314 \
    --hash=sha256:c5861eec6813dd630e0bdf1ee8c4a9b65eff71039c65c51de4a3b6c63bc8ca76 \
    --hash=sha256:16783b2e8091eb2de0ce579cc8865720e67b9b701dab7f8ea86e9fbb0839f399 \
    --hash=sha256:f4effd7663b7f9e733eb42ded30e718904fe889405853dfcaa64819a590b9c26 \
    --hash=sha256:b740d507006f9f4c798222c417055fe45dc74db73ec606c3247236cac582e885 \
    --hash=sha256:47e1180022823cd03cc979a3f8a47b0721e73e98eebebc9015aa89c1019ac889
fasteners==0.15 \
    --hash=sha256:007e4d2b2d4a10093f67e932e5166722d2eab83b77724156e92ad013c6226574 \
    --hash=sha256:3a176da6b70df9bb88498e1a18a9e4a8579ed5b9141207762368a1017bf8f5ef
future==0.18.2 \
    --hash=sha256:b1bead90b70cf6ec3f0710ae53a525360fa360d306a86583adc6bf83a4db537d
google-api-core==1.23.0 \
    --hash=sha256:1bb3c485c38eacded8d685b1759968f6cf47dd9432922d34edb90359eaa391e2 \
    --hash=sha256:94d8c707d358d8d9e8b0045c42be20efb58433d308bd92cf748511c7825569c8
google-apitools==0.5.31 \
    --hash=sha256:4af0dd6dd4582810690251f0b57a97c1873dadfda54c5bc195844c8907624170 \
    --hash=sha256:6be92c1c3e93485450420bb0e365d47eb4d8a835d03ebe1963dc6da4d39a7b0e
google-auth==1.23.0 \
    --hash=sha256:5176db85f1e7e837a646cd9cede72c3c404ccf2e3373d9ee14b2db88febad440 \
    --hash=sha256:b728625ff5dfce8f9e56a499c8a4eb51443a67f20f6d28b67d5774c310ec4b6b
google-cloud-bigquery==1.24.0 \
    --hash=sha256:7ffcceed8becea20cb4ce4bdf9b924822780416ff1a9d497f9a1238a3f1442b1 \
    --hash=sha256:23c9180e87f6093eb6f2ae880d7f7697fdab991a4616439ad0f95cd37014f0dd
google-cloud-bigtable==1.0.0 \
    --hash=sha256:1c40f09593b5d16cf2a5ee91644676a0b01acc8082304c3100e4c0fb7c821d8b \
    --hash=sha256:4323362b836ddf9e7324b0be1a34e3f80d09729356233c05e969b577244c49a3
google-cloud-core==1.4.3 \
    --hash=sha256:21afb70c1b0bce8eeb8abb5dca63c5fd37fc8aea18f4b6d60e803bd3d27e6b80 \
    --hash=sha256:75abff9056977809937127418323faa3917f32df68490704d39a4f0d492ebc2b
google-cloud-datastore==1.7.4 \
    --hash=sha256:7a44d9b0263cbbe05963522f61ba177e64282043f30999e0bc3368fd79a3af12 \
    --hash=sha256:ffb075abf606ebd248c3ad76ac0e6d3e93858d8c61a063139938a162a58b28d0
google-cloud-dlp==0.13.0 \
    --hash=sha256:844f5e63597c2a15561eec68397ee5f425e9be7728d2d7072f50f983fab31b9a \
    --hash=sha256:fbcb489bf9df2b4137b0a571d7de6055454c320a88f0de945950dba8cfb8c9d0
google-cloud-language==1.3.0 \
    --hash=sha256:2772badf8fe8ac57cd7e7840a60764603b3e19e6dbd843460a5ae8915798b32f \
    --hash=sha256:76e349fcc223c66b9aa138e05853f4bf550f0d06e37641c2c206dc2661b83f30
google-cloud-pubsub==1.0.2 \
    --hash=sha256:afb08eb558f3e4d836e6f77443f81555d6921ffc888c7c3085acd1205fba6e8c \
    --hash=sha256:12ff565ef00e4ca19d2ae26ae4515070094ba857d7c7024370dbed81fc7d58ab
google-cloud-spanner==1.13.0 \
    --hash=sha256:eafa09cc344339a23702ee74eac5713974fefafdfd56afb589bd25548c79c80d \
    --hash=sha256:9f36c2e9d6a379ca692b63a39d92c4064d094209468ce155ad8e8d57249b410f
google-cloud-videointelligence==1.13.0 \
    --hash=sha256:f11f9431b4d120733a90f0e584ead3db0a061218289d902ebe249986d4de829f \
    --hash=sha256:933c05ccdcb422155154b5a5d752ccef0efd004105fa3940a230e2070f7a1fef
google-cloud-vision==0.42.0 \
    --hash=sha256:a162a546638cbc9db3db8a97a87a1e1930c3e316b01b2a1b78c8176630852ccd \
    --hash=sha256:afe6f6dc23741818cdf63078372ce832c5984ce14f055be2a86fd2bafe07cc65
google-resumable-media==0.5.1 \
    --hash=sha256:97155236971970382b738921f978a6f86a7b5a0b0311703d991e065d3cb55773 \
    --hash=sha256:cdc64378dc9a7a7bf963a8d0c944c99b549dc0c195a9acbf1fcd465f380b9002
googleapis-common-protos==1.52.0 \
    --hash=sha256:560716c807117394da12cecb0a54da5a451b5cf9866f1d37e9a5e2329a665351 \
    --hash=sha256:c8961760f5aad9a711d37b675be103e0cc4e9a39327e0d6d857872f698403e24
grpc-google-iam-v1==0.12.3 \
    --hash=sha256:0bfb5b56f648f457021a91c0df0db4934b6e0c300bd0f2de2333383fe958aa72
grpcio-gcp==0.2.2 \
    --hash=sha256:e292605effc7da39b7a8734c719afb12ec4b5362add3528d8afad3aa3aa9057c \
    --hash=sha256:1ef8e8531eab11356a3eb4c5b84e79e0d923d6782d19e1b1a45e1cabe4e783d7
grpcio==1.33.2 \
    --hash=sha256:c5030be8a60fb18de1fc8d93d130d57e4296c02f229200df814f6578da00429e \
    --hash=sha256:5b21d3de520a699cb631cfd3a773a57debeb36b131be366bf832153405cc5404 \
    --hash=sha256:b412f43c99ca72769306293ba83811b241d41b62ca8f358e47e0fdaf7b6fbbd7 \
    --hash=sha256:703da25278ee7318acb766be1c6d3b67d392920d002b2d0304e7f3431b74f6c1 \
    --hash=sha256:2f2eabfd514af8945ee415083a0f849eea6cb3af444999453bb6666fadc10f54 \
    --hash=sha256:d51ddfb3d481a6a3439db09d4b08447fb9f6b60d862ab301238f37bea8f60a6d \
    --hash=sha256:407b4d869ce5c6a20af5b96bb885e3ecaf383e3fb008375919eb26cf8f10d9cd \
    --hash=sha256:abaf30d18874310d4439a23a0afb6e4b5709c4266966401de7c4ae345cc810ee \
    --hash=sha256:f2673c51e8535401c68806d331faba614bcff3ee16373481158a2e74f510b7f6 \
    --hash=sha256:65b06fa2db2edd1b779f9b256e270f7a58d60e40121660d8b5fd6e8b88f122ed \
    --hash=sha256:514b4a6790d6597fc95608f49f2f13fe38329b2058538095f0502b734b98ffd2 \
    --hash=sha256:4cef3eb2df338abd9b6164427ede961d351c6bf39b4a01448a65f9e795f56575 \
    --hash=sha256:3ac453387add933b6cfbc67cc8635f91ff9895299130fc612c3c4b904e91d82a \
    --hash=sha256:7d292dabf7ded9c062357f8207e20e94095a397d487ffd25aa213a2c3dff0ab4 \
    --hash=sha256:0aeed3558a0eec0b31700af6072f1c90e8fd5701427849e76bc469554a14b4f5 \
    --hash=sha256:88f2a102cbc67e91f42b4323cec13348bf6255b25f80426088079872bd4f3c5c \
    --hash=sha256:affbb739fde390710190e3540acc9f3e65df25bd192cc0aa554f368288ee0ea2 \
    --hash=sha256:ffec0b854d2ed6ee98776c7168c778cdd18503642a68d36c00ba0f96d4ccff7c \
    --hash=sha256:7744468ee48be3265db798f27e66e118c324d7831a34fd39d5775bcd5a70a2c4 \
    --hash=sha256:6a1b5b7e47600edcaeaa42983b1c19e7a5892c6b98bcde32ae2aa509a99e0436 \
    --hash=sha256:289671cfe441069f617bf23c41b1fa07053a31ff64de918d1016ac73adda2f73 \
    --hash=sha256:a8c84db387907e8d800c383e4c92f39996343adedf635ae5206a684f94df8311 \
    --hash=sha256:4bb771c4c2411196b778871b519c7e12e87f3fa72b0517b22f952c64ead07958 \
    --hash=sha256:b581ddb8df619402c377c81f186ad7f5e2726ad9f8d57047144b352f83f37522 \
    --hash=sha256:02a4a637a774382d6ac8e65c0a7af4f7f4b9704c980a0a9f4f7bbc1e97c5b733 \
    --hash=sha256:592656b10528aa327058d2007f7ab175dc9eb3754b289e24cac36e09129a2f6b \
    --hash=sha256:c89510381cbf8c8317e14e747a8b53988ad226f0ed240824064a9297b65f921d \
    --hash=sha256:7fda62846ef8d86caf06bd1ecfddcae2c7e59479a4ee28808120e170064d36cc \
    --hash=sha256:d386630af995fd4de225d550b6806507ca09f5a650f227fddb29299335cda55e \
    --hash=sha256:bf7de9e847d2d14a0efcd48b290ee181fdbffb2ae54dfa2ec2a935a093730bac \
    --hash=sha256:7c1ea6ea6daa82031af6eb5b7d1ab56b1193840389ea7cf46d80e98636f8aff5 \
    --hash=sha256:85e56ab125b35b1373205b3802f58119e70ffedfe0d7e2821999126058f7c44f \
    --hash=sha256:0cebba3907441d5c620f7b491a780ed155140fbd590da0886ecfb1df6ad947b9 \
    --hash=sha256:52143467237bfa77331ed1979dc3e203a1c12511ee37b3ddd9ff41b05804fb10 \
    --hash=sha256:8cf67b8493bff50fa12b4bc30ab40ce1f1f216eb54145962b525852959b0ab3d \
    --hash=sha256:fa78bd55ec652d4a88ba254c8dae623c9992e2ce647bd17ba1a37ca2b7b42222 \
    --hash=sha256:143b4fe72c01000fc0667bf62ace402a6518939b3511b3c2bec04d44b1d7591c \
    --hash=sha256:08b6a58c8a83e71af5650f8f879fe14b7b84dce0c4969f3817b42c72989dacf0 \
    --hash=sha256:56e2a985efdba8e2282e856470b684e83a3cadd920f04fcd360b4b826ced0dd3 \
    --hash=sha256:62ce7e86f11e8c4ff772e63c282fb5a7904274258be0034adf37aa679cf96ba0 \
    --hash=sha256:7f727b8b6d9f92fcab19dbc62ec956d8352c6767b97b8ab18754b2dfa84d784f \
    --hash=sha256:2d5124284f9d29e4f06f674a12ebeb23fc16ce0f96f78a80a6036930642ae5ab \
    --hash=sha256:eff55d318a114742ed2a06972f5daacfe3d5ad0c0c0d9146bcaf10acb427e6be \
    --hash=sha256:21265511880056d19ce4f809ce3fbe2a3fa98ec1fc7167dbdf30a80d3276202e
hdfs==2.5.8 \
    --hash=sha256:1be117549fc1285571bc51aedc15df5a203138dba02f9adfa26761b69a949370
httplib2==0.17.4 \
    --hash=sha256:743cff16beadd128511e786474740264aa805fba106d6fc90e3586829ad0298b \
    --hash=sha256:1e9340ecf0187a621bdcfb407c32e04e8e09fc6ab28b050efa38f20eae0e975f
idna==2.10 \
    --hash=sha256:b97d804b1e9b523befed77c48dacec60e6dcb0b5391d57af6a65a312a90648c0 \
    --hash=sha256:b307872f855b18632ce0c21c5e45be78c0ea7ae4c15c828c20788b26921eb3f6
mock==2.0.0 \
    --hash=sha256:5ce3c71c5545b472da17b72268978914d0252980348636840bd34a00b5cc96c1 \
    --hash=sha256:b158b6df76edd239b8208d481dc46b6afd45a846b7812ff0ce58971cf5bc8bba
monotonic==1.5 \
    --hash=sha256:552a91f381532e33cbd07c6a2655a21908088962bb8fa7239ecbcc6ad1140cc7 \
    --hash=sha256:23953d55076df038541e648a53676fb24980f7a1be290cdda21300b3bc21dfb0
numpy==1.19.4 \
    --hash=sha256:e9b30d4bd69498fc0c3fe9db5f62fffbb06b8eb9321f92cc970f2969be5e3949 \
    --hash=sha256:fedbd128668ead37f33917820b704784aff695e0019309ad446a6d0b065b57e4 \
    --hash=sha256:8ece138c3a16db8c1ad38f52eb32be6086cc72f403150a79336eb2045723a1ad \
    --hash=sha256:64324f64f90a9e4ef732be0928be853eee378fd6a01be21a0a8469c4f2682c83 \
    --hash=sha256:ad6f2ff5b1989a4899bf89800a671d71b1612e5ff40866d1f4d8bcf48d4e5764 \
    --hash=sha256:d6c7bb82883680e168b55b49c70af29b84b84abb161cbac2800e8fcb6f2109b6 \
    --hash=sha256:13d166f77d6dc02c0a73c1101dd87fdf01339febec1030bd810dcd53fff3b0f1 \
    --hash=sha256:448ebb1b3bf64c0267d6b09a7cba26b5ae61b6d2dbabff7c91b660c7eccf2bdb \
    --hash=sha256:27d3f3b9e3406579a8af3a9f262f5339005dd25e0ecf3cf1559ff8a49ed5cbf2 \
    --hash=sha256:16c1b388cc31a9baa06d91a19366fb99ddbe1c7b205293ed072211ee5bac1ed2 \
    --hash=sha256:e5b6ed0f0b42317050c88022349d994fe72bfe35f5908617512cd8c8ef9da2a9 \
    --hash=sha256:18bed2bcb39e3f758296584337966e68d2d5ba6aab7e038688ad53c8f889f757 \
    --hash=sha256:fe45becb4c2f72a0907c1d0246ea6449fe7a9e2293bb0e11c4e9a32bb0930a15 \
    --hash=sha256:6d7593a705d662be5bfe24111af14763016765f43cb6923ed86223f965f52387 \
    --hash=sha256:6ae6c680f3ebf1cf7ad1d7748868b39d9f900836df774c453c11c5440bc15b36 \
    --hash=sha256:9eeb7d1d04b117ac0d38719915ae169aa6b61fca227b0b7d198d43728f0c879c \
    --hash=sha256:cb1017eec5257e9ac6209ac172058c430e834d5d2bc21961dceeb79d111e5909 \
    --hash=sha256:edb01671b3caae1ca00881686003d16c2209e07b7ef8b7639f1867852b948f7c \
    --hash=sha256:f29454410db6ef8126c83bd3c968d143304633d45dc57b51252afbd79d700893 \
    --hash=sha256:ec149b90019852266fec2341ce1db513b843e496d5a8e8cdb5ced1923a92faab \
    --hash=sha256:1aeef46a13e51931c0b1cf8ae1168b4a55ecd282e6688fdb0a948cc5a1d5afb9 \
    --hash=sha256:08308c38e44cc926bdfce99498b21eec1f848d24c302519e64203a8da99a97db \
    --hash=sha256:5734bdc0342aba9dfc6f04920988140fb41234db42381cf7ccba64169f9fe7ac \
    --hash=sha256:09c12096d843b90eafd01ea1b3307e78ddd47a55855ad402b157b6c4862197ce \
    --hash=sha256:e452dc66e08a4ce642a961f134814258a082832c78c90351b75c41ad16f79f63 \
    --hash=sha256:a5d897c14513590a85774180be713f692df6fa8ecf6483e561a6d47309566f37 \
    --hash=sha256:a09f98011236a419ee3f49cedc9ef27d7a1651df07810ae430a6b06576e0b414 \
    --hash=sha256:50e86c076611212ca62e5a59f518edafe0c0730f7d9195fec718da1a5c2bb1fc \
    --hash=sha256:f0d3929fe88ee1c155129ecd82f981b8856c5d97bcb0d5f23e9b4242e79d1de3 \
    --hash=sha256:c42c4b73121caf0ed6cd795512c9c09c52a7287b04d105d112068c1736d7c753 \
    --hash=sha256:8cac8790a6b1ddf88640a9267ee67b1aee7a57dfa2d2dd33999d080bc8ee3a0f \
    --hash=sha256:4377e10b874e653fe96985c05feed2225c912e328c8a26541f7fc600fb9c637b \
    --hash=sha256:2a2740aa9733d2e5b2dfb33639d98a64c3b0f24765fed86b0fd2aec07f6a0a08 \
    --hash=sha256:141ec3a3300ab89c7f2b0775289954d193cc8edb621ea05f99db9cb181530512
oauth2client==3.0.0 \
    --hash=sha256:5b5b056ec6f2304e7920b632885bd157fa71d1a7f3ddd00a43b1541a8d1a2460
pbr==5.5.1 \
    --hash=sha256:b236cde0ac9a6aedd5e3c34517b423cd4fd97ef723849da6b0d2231142d89c00 \
    --hash=sha256:5fad80b613c402d5b7df7bd84812548b2a61e9977387a80a5fc5c396492b13c9
protobuf==3.13.0 \
    --hash=sha256:9c2e63c1743cba12737169c447374fab3dfeb18111a460a8c1a000e35836b18c \
    --hash=sha256:1e834076dfef9e585815757a2c7e4560c7ccc5962b9d09f831214c693a91b463 \
    --hash=sha256:df3932e1834a64b46ebc262e951cd82c3cf0fa936a154f0a42231140d8237060 \
    --hash=sha256:8c35bcbed1c0d29b127c886790e9d37e845ffc2725cc1db4bd06d70f4e8359f4 \
    --hash=sha256:339c3a003e3c797bc84499fa32e0aac83c768e67b3de4a5d7a5a9aa3b0da634c \
    --hash=sha256:361acd76f0ad38c6e38f14d08775514fbd241316cce08deb2ce914c7dfa1184a \
    --hash=sha256:9edfdc679a3669988ec55a989ff62449f670dfa7018df6ad7f04e8dbacb10630 \
    --hash=sha256:5db9d3e12b6ede5e601b8d8684a7f9d90581882925c96acf8495957b4f1b204b \
    --hash=sha256:c8abd7605185836f6f11f97b21200f8a864f9cb078a193fe3c9e235711d3ff1e \
    --hash=sha256:4d1174c9ed303070ad59553f435846a2f877598f59f9afc1b89757bdf846f2a7 \
    --hash=sha256:0bba42f439bf45c0f600c3c5993666fcb88e8441d011fad80a11df6f324eef33 \
    --hash=sha256:c0c5ab9c4b1eac0a9b838f1e46038c3175a95b0f2d944385884af72876bd6bc7 \
    --hash=sha256:f68eb9d03c7d84bd01c790948320b768de8559761897763731294e3bc316decb \
    --hash=sha256:91c2d897da84c62816e2f473ece60ebfeab024a16c1751aaf31100127ccd93ec \
    --hash=sha256:3dee442884a18c16d023e52e32dd34a8930a889e511af493f6dc7d4d9bf12e4f \
    --hash=sha256:e7662437ca1e0c51b93cadb988f9b353fa6b8013c0385d63a70c8a77d84da5f9 \
    --hash=sha256:d69697acac76d9f250ab745b46c725edf3e98ac24763990b24d58c16c642947a \
    --hash=sha256:6a82e0c8bb2bf58f606040cc5814e07715b2094caeba281e2e7d0b0e2e397db5
pyarrow==0.17.1; python_version >= "3.0" or platform_system != "Windows" \
    --hash=sha256:ea2dd2b55edd9b893e9b6ac2dc8a84fd66598636b933aece04768960a9dd1667 \
    --hash=sha256:b142cc9b42e9b87a2f0624b2bd176a84ec7f47d170de1c46eeb155eab1d08dbd \
    --hash=sha256:5a0f5279bee86310f8c02706e1c706ccc30d030b1febd844f2a269f3fc7cafae \
    --hash=sha256:d6b352da205d58aa1a5705075a5e547ff7fb610b182e38d211a17dccad88d72d \
    --hash=sha256:99b0fc309660fe1ff122d14c6b42f79f8e6cc5324223f85f1190c108e40c6e4a \
    --hash=sha256:837a22f34b9c941ca7bdb6ff7ca7dd9381d590ea60de64c3829cdd2b90fafebb \
    --hash=sha256:b46c693dd766fc7cab41a803653e80930ec1b71ac51c7f42b5d62b7cae1c2efa \
    --hash=sha256:a1e19a532d4d8a46c2484d914670034f7ea3ef4884c1cd9600ecb1ac8aecd28d \
    --hash=sha256:2af53a80076ab802cbfcd97063645b45d81d1e5ca206c7edcf122fa4d36026d9 \
    --hash=sha256:9508a0514b94068a9811608c2362393fb2de8308f4152fbc8572fa275759fbf7 \
    --hash=sha256:3562ac22b0647c212aa9c0b21a2caeeb21d02aa7ba2cb696a355893f50bc18b0 \
    --hash=sha256:38d1ef84c66123dc9eb8514f32fa866652df204c9ce1e5930461ea8f2ba9bffb \
    --hash=sha256:ee45471f7929d8951b42b1b875dee2be56952f026057c920af6c213d1ae54ace \
    --hash=sha256:cc3fb951347993ad9d5aa38c3aabd9be8341994b35c2fcc307f507a298187196 \
    --hash=sha256:59b200dd3344413f7f68a5745a30964b690c41c23d5e95475be865fd264550ff \
    --hash=sha256:e6f736df6c88836ce3eeb0fee1de939af56981f82aa9b3bdef2ab6f3201de05e \
    --hash=sha256:841b3780aee3cb307fecdfaaae94ca5f3e49b28634335da63d0e383053187149 \
    --hash=sha256:375641f817382c5562c204f7d355f134400de0a778642e419d69fe4d55d38917 \
    --hash=sha256:21b4d31a2813e81ed6664c37decb548618fd93838f983c3d634e3eae1d91a597 \
    --hash=sha256:18f65739d1d8ed8ad0d88228fd9ab76558a9c808c01dca2f24be2c72b875f43b \
    --hash=sha256:278d11800c2e0f9bea6314ef718b2368b4046ba24b6c631c14edad5a1d351e49
pyasn1-modules==0.2.8 \
    --hash=sha256:905f84c712230b2c592c19470d3ca8d552de726050d1d1716282a1f6146be65e \
    --hash=sha256:0fe1b68d1e486a1ed5473f1302bd991c1611d319bba158e98b106ff86e1d7199 \
    --hash=sha256:fe0644d9ab041506b62782e92b06b8c68cca799e1a9636ec398675459e031405 \
    --hash=sha256:a99324196732f53093a84c4369c996713eb8c89d360a496b599fb1a9c47fc3eb \
    --hash=sha256:0845a5582f6a02bb3e1bde9ecfc4bfcae6ec3210dd270522fee602365430c3f8 \
    --hash=sha256:a50b808ffeb97cb3601dd25981f6b016cbb3d31fbf57a8b8a87428e6158d0c74 \
    --hash=sha256:f39edd8c4ecaa4556e989147ebf219227e2cd2e8a43c7e7fcb1f1c18c5fd6a3d \
    --hash=sha256:b80486a6c77252ea3a3e9b1e360bc9cf28eaac41263d173c032581ad2f20fe45 \
    --hash=sha256:65cebbaffc913f4fe9e4808735c95ea22d7a7775646ab690518c056784bc21b4 \
    --hash=sha256:15b7c67fabc7fc240d87fb9aabf999cf82311a6d6fb2c70d00d3d0604878c811 \
    --hash=sha256:426edb7a5e8879f1ec54a1864f16b882c2837bfd06eee62f2c982315ee2473ed \
    --hash=sha256:cbac4bc38d117f2a49aeedec4407d23e8866ea4ac27ff2cf7fb3e5b570df19e0 \
    --hash=sha256:c29a5e5cc7a3f05926aff34e097e84f8589cd790ce0ed41b67aed6857b26aafd
pyasn1==0.4.8 \
    --hash=sha256:fec3e9d8e36808a28efb59b489e4528c10ad0f480e57dcc32b4de5c9d8c9fdf3 \
    --hash=sha256:0458773cfe65b153891ac249bcf1b5f8f320b7c2ce462151f8fa74de8934becf \
    --hash=sha256:5c9414dcfede6e441f7e8f81b43b34e834731003427e5b09e4e00e3172a10f00 \
    --hash=sha256:6e7545f1a61025a4e58bb336952c5061697da694db1cae97b116e9c46abcf7c8 \
    --hash=sha256:39c7e2ec30515947ff4e87fb6f456dfc6e84857d34be479c9d4a4ba4bf46aa5d \
    --hash=sha256:78fa6da68ed2727915c4767bb386ab32cdba863caa7dbe473eaae45f9959da86 \
    --hash=sha256:08c3c53b75eaa48d71cf8c710312316392ed40899cb34710d092e96745a358b7 \
    --hash=sha256:03840c999ba71680a131cfaee6fab142e1ed9bbd9c693e285cc6aca0d555e576 \
    --hash=sha256:7ab8a544af125fb704feadb008c99a88805126fb525280b2270bb25cc1d78a12 \
    --hash=sha256:e89bf84b5437b532b0803ba5c9a5e054d21fec423a89952a74f87fa2c9b7bce2 \
    --hash=sha256:014c0e9976956a08139dc0712ae195324a75e142284d5f87f1a87ee1b068a359 \
    --hash=sha256:99fcc3c8d804d1bc6d9a099921e39d827026409a58f2a720dcdb89374ea0c776 \
    --hash=sha256:aef77c9fb94a3ac588e87841208bdec464471d9871bd5050a287cc9a475cd0ba
pydot==1.4.1 \
    --hash=sha256:67be714300c78fda5fd52f79ec994039e3f76f074948c67b5ff539b433ad354f \
    --hash=sha256:d49c9d4dd1913beec2a997f831543c8cbd53e535b1a739e921642fe416235f01
pymongo==3.11.0 \
    --hash=sha256:7a4a6f5b818988a3917ec4baa91d1143242bdfece8d38305020463955961266a \
    --hash=sha256:c4869141e20769b65d2d72686e7a7eb141ce9f3168106bed3e7dcced54eb2422 \
    --hash=sha256:ef76535776c0708a85258f6dc51d36a2df12633c735f6d197ed7dfcaa7449b99 \
    --hash=sha256:d226e0d4b9192d95079a9a29c04dd81816b1ce8903b8c174a39224fe978547cb \
    --hash=sha256:68220b81850de8e966d4667d5c325a96c6ac0d6adb3d18935d6e3d325d441f48 \
    --hash=sha256:f6efca006a81e1197b925a7d7b16b8f61980697bb6746587aad8842865233218 \
    --hash=sha256:7307024b18266b302f4265da84bb1effb5d18999ef35b30d17592959568d5c0a \
    --hash=sha256:8ea13d0348b4c96b437d944d7068d59ed4a6c98aaa6c40d8537a2981313f1c66 \
    --hash=sha256:6a15e2bee5c4188369a87ed6f02de804651152634a46cca91966a11c8abd2550 \
    --hash=sha256:d64c98277ea80e4484f1332ab107e8dfd173a7dcf1bdbf10a9cccc97aaab145f \
    --hash=sha256:83c5a3ecd96a9f3f11cfe6dfcbcec7323265340eb24cc996acaecea129865a3a \
    --hash=sha256:890b0f1e18dbd898aeb0ab9eae1ab159c6bcbe87f0abb065b0044581d8614062 \
    --hash=sha256:9fc17fdac8f1973850d42e51e8ba6149d93b1993ed6768a24f352f926dd3d587 \
    --hash=sha256:421aa1b92c291c429668bd8d8d8ec2bd00f183483a756928e3afbf2b6f941f00 \
    --hash=sha256:a2787319dc69854acdfd6452e6a8ba8f929aeb20843c7f090e04159fc18e6245 \
    --hash=sha256:455f4deb00158d5ec8b1d3092df6abb681b225774ab8a59b3510293b4c8530e3 \
    --hash=sha256:25e617daf47d8dfd4e152c880cd0741cbdb48e51f54b8de9ddbfe74ecd87dd16 \
    --hash=sha256:7122ffe597b531fb065d3314e704a6fe152b81820ca5f38543e70ffcc95ecfd4 \
    --hash=sha256:d0565481dc196986c484a7fb13214fc6402201f7fb55c65fd215b3324962fe6c \
    --hash=sha256:4437300eb3a5e9cc1a73b07d22c77302f872f339caca97e9bf8cf45eca8fa0d2 \
    --hash=sha256:d38b35f6eef4237b1d0d8e845fc1546dad85c55eba447e28c211da8c7ef9697c \
    --hash=sha256:137e6fa718c7eff270dbd2fc4b90d94b1a69c9e9eb3f3de9e850a7fd33c822dc \
    --hash=sha256:c0d660a186e36c526366edf8a64391874fe53cf8b7039224137aee0163c046df \
    --hash=sha256:d1b3366329c45a474b3bbc9b9c95d4c686e03f35da7fd12bc144626d1f2a7c04 \
    --hash=sha256:b7c522292407fa04d8195032493aac937e253ad9ae524aab43b9d9d242571f03 \
    --hash=sha256:9755c726aa6788f076114dfdc03b92b03ff8860316cca00902cce88bcdb5fedd \
    --hash=sha256:50531caa7b4be1c4ed5e2d5793a4e51cc9bd62a919a6fd3299ef7c902e206eab \
    --hash=sha256:cc4057f692ac35bbe82a0a908d42ce3a281c9e913290fac37d7fa3bd01307dfb \
    --hash=sha256:213c445fe7e654621c6309e874627c35354b46ef3ee807f5a1927dc4b30e1a67 \
    --hash=sha256:4ae23fbbe9eadf61279a26eba866bbf161a6f7e2ffad14a42cf20e9cb8e94166 \
    --hash=sha256:8deda1f7b4c03242f2a8037706d9584e703f3d8c74d6d9cac5833db36fe16c42 \
    --hash=sha256:e8c446882cbb3774cd78c738c9f58220606b702b7c1655f1423357dc51674054 \
    --hash=sha256:d9de8427a5601799784eb0e7fa1b031aa64086ce04de29df775a8ca37eedac41 \
    --hash=sha256:3d9bb1ba935a90ec4809a8031efd988bdb13cdba05d9e9a3e9bf151bf759ecde \
    --hash=sha256:96782ebb3c9e91e174c333208b272ea144ed2a684413afb1038e3b3342230d72 \
    --hash=sha256:50127b13b38e8e586d5e97d342689405edbd74ad0bd891d97ee126a8c7b6e45f \
    --hash=sha256:bd312794f51e37dcf77f013d40650fe4fbb211dd55ef2863839c37480bd44369 \
    --hash=sha256:4797c0080f41eba90404335e5ded3aa66731d303293a675ff097ce4ea3025bb9 \
    --hash=sha256:1f865b1d1c191d785106f54df9abdc7d2f45a946b45fd1ea0a641b4f982a2a77 \
    --hash=sha256:cccf1e7806f12300e3a3b48f219e111000c2538483e85c869c35c1ae591e6ce9 \
    --hash=sha256:05fcc6f9c60e6efe5219fbb5a30258adb3d3e5cbd317068f3d73c09727f2abb6 \
    --hash=sha256:9dbab90c348c512e03f146e93a5e2610acec76df391043ecd46b6b775d5397e6 \
    --hash=sha256:689142dc0c150e9cb7c012d84cac2c346d40beb891323afb6caf18ec4caafae0 \
    --hash=sha256:4b32744901ee9990aa8cd488ec85634f443526def1e5190a407dc107148249d7 \
    --hash=sha256:e6a15cf8f887d9f578dd49c6fb3a99d53e1d922fdd67a245a67488d77bf56eb2 \
    --hash=sha256:e8d188ee39bd0ffe76603da887706e4e7b471f613625899ddf1e27867dc6a0d3 \
    --hash=sha256:9ee0eef254e340cc11c379f797af3977992a7f2c176f1a658740c94bf677e13c \
    --hash=sha256:91e96bf85b7c07c827d339a386e8a3cf2e90ef098c42595227f729922d0851df \
    --hash=sha256:ce208f80f398522e49d9db789065c8ad2cd37b21bd6b23d30053474b7416af11 \
    --hash=sha256:475a34a0745c456ceffaec4ce86b7e0983478f1b6140890dff7b161e7bcd895b \
    --hash=sha256:40696a9a53faa7d85aaa6fd7bef1cae08f7882640bad08c350fb59dee7ad069b \
    --hash=sha256:03dc64a9aa7a5d405aea5c56db95835f6a2fa31b3502c5af1760e0e99210be30 \
    --hash=sha256:63a5387e496a98170ffe638b435c0832c0f2011a6f4ff7a2880f17669fff8c03 \
    --hash=sha256:076a7f2f7c251635cf6116ac8e45eefac77758ee5a77ab7bd2f63999e957613b
pyparsing==2.4.7 \
    --hash=sha256:ef9d7589ef3c200abe66653d3f1ab1033c3c419ae9b9bdb1240a85b024efc88b \
    --hash=sha256:c203ec8783bf771a155b207279b9bccb8dea02d8f0c9e5f8ead507bc3246ecc1
python-dateutil==2.8.1 \
    --hash=sha256:73ebfe9dbf22e832286dafa60473e4cd239f8592f699aa5adaf10050e6e1823c \
    --hash=sha256:75bb3f31ea686f1197762692a9ee6a7550b59fc6ca3a1f4b5d7e32fb98e2da2a
pytz==2020.4 \
    --hash=sha256:5c55e189b682d420be27c6995ba6edce0c0a77dd67bfbe2ae6607134d5851ffd \
    --hash=sha256:3e6b7dd2d1e0a59084bcee14a17af60c5c562cdc16d828e8eba2e683d3a7e268
requests==2.24.0 \
    --hash=sha256:fe75cc94a9443b9246fc7049224f75604b113c36acb93f87b80ed42c44cbb898 \
    --hash=sha256:b3559a131db72c33ee969480840fff4bb6dd111de7dd27c8ee1f820f4f00231b
rsa==4.6 \
    --hash=sha256:6166864e23d6b5195a5cfed6cd9fed0fe774e226d8f854fcb23b7bbef0350233 \
    --hash=sha256:109ea5a66744dd859bf16fe904b8d8b627adafb9408753161e766a92e7d681fa
rsa==4.6; python_version >= "3.5" \
    --hash=sha256:6166864e23d6b5195a5cfed6cd9fed0fe774e226d8f854fcb23b7bbef0350233 \
    --hash=sha256:109ea5a66744dd859bf16fe904b8d8b627adafb9408753161e766a92e7d681fa
six==1.15.0 \
    --hash=sha256:8b74bedcbbbaca38ff6d7491d76f2b06b3592611af620f8426e82dddb04a5ced \
    --hash=sha256:30639c035cdb23534cd4aa2dd52c3bf48f06e5f4a941509c8bafd8ce11080259
typing-extensions==3.7.4.3 \
    --hash=sha256:dafc7639cde7f1b6e1acc0f457842a83e722ccca8eef5270af2d74792619a89f \
    --hash=sha256:7cb407020f00f7bfc3cb3e7881628838e69d8f3fcab2f64742a5e76b2f841918 \
    --hash=sha256:99d4073b617d30288f569d3f13d2bd7548c3a7e4c8de87db09a9d29bb3a4a60c
urllib3==1.25.11 \
    --hash=sha256:f5321fbe4bf3fefa0efd0bfe7fb14e90909eb62a48ccda331726b4319897dd5e \
    --hash=sha256:8d7eaa5a82a1cac232164990f04874c594c9453ec55eef02eab885aa02fc17a2

with exact version for each subdependency... maybe when installing requirements.txtthey do not act the same? That's really weird.

The error I get in the job: image

I don't get the error py options not set in envsetup file not set in env... I searched on Google for exact same sentence but found nothing.

Do you use a specific package manager over pip like poetry or another one?

MelodyShen commented 3 years ago

Some info may help.

About the job duration, my testing job took around 10 mins to be completed and most of time was spent in executing the pipeline (which is the print("starting") one). Sometimes the time to start a gce instance may take up to 2 mins.

py options not set in env setup file not set in env is just a statement (not error) that those environment variables are not set. Are there more details in the log entry of cloudservice.service?

sneko commented 3 years ago

@MelodyShen print("starting") is the moment the job will build the graph, on my side with @davidcavazos's working example between building the graph and the first worker logs there is 10 minutes between. That's really huge. Far from the 2 minutes you mentioned to start a GCE instance... What's soooo looong haha?

py options not set in env setup file not set in env is not an error, but that's the message I see compared to @davidcavazos 's job. The only difference is that's I'm using the full requirements.txt exported by Poetry (and that mine fails).

I'm not sure what do you mean by details of the log entry, I can just expand the JSON object but nothing interesting inside: image

On your side, is there any Beam/Dataflow pipeline you manage with an external dependency manager (Poetry or another one)? I'm curious to see the requirements.txt you get at the end.

EDIT: requirements.txt is standard, so I don't get why by having the full details of subdependencies it fails. Maybe that's due to specific links between deps since gcp is the extra of apache-beam and that's it's not formatted inside the requirements.txt output?

sneko commented 3 years ago

I just needed to adjust the last RUN from the Dockerfile with command mentioned above.

Thanks for taking time to answer. (I think this should be mentioned somewhere... lost days on this haha...)

sneko commented 3 years ago

@davidcavazos could you please expand on the usage of both:

FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE
FLEX_TEMPLATE_PYTHON_SETUP_FILE

?

When those variables are used? Only by workers?

Because with the .tar.gz I'm now able to have my graph parsed and the job tries to setup workers but they fail to find my custom dependencies like loguru. image

For me I had to set:

  pipeline_options = PipelineOptions(save_main_session=True, streaming=False)

since one of my DoFn is using a custom module. Am I wrong?

EDIT: it works by also making available the setup.py to workers. It would be great to clarify the distinction between "parsing the graph", and then "executing it on the workers". Because in the Dockerfile there are 2 things: environment variable for the workers, but a custom RUN pip install for the graph parser from what I understand.

daisy1754 commented 3 years ago

Hi @sneko what do you mean by making setup.py available to workers? @davidcavazos I'm also curious about usage of SETUP_FILE.

I'm struggling with including other file with setup.py and py_modules. Other people also reported FLEX_TEMPLATE_PYTHON_SETUP_FILE doesn't work https://stackoverflow.com/questions/64895504/including-another-file-in-dataflow-python-flex-template-importerror

sneko commented 3 years ago
COPY dist/*.tar.gz dist/

RUN tar -xf dist/*.tar.gz --strip-components=1 -C . && \
  rm -R dist/

ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/main.py"
ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE="${WORKDIR}/setup.py"

RUN python setup.py install

That's the a part of my Dockerfile. As you can see I made my own program a Python package (.tar.gz, not a wheel, I didn't succeeded with it). It's then copied into the image, unarchive, and from what I understand as mentioned previously:

I'm not 100% sure of the behavior, but that's how I think it works after fighting a long time.

I guess they both use the same image, but it seems the pip packages are not available to workers if not specified inside the environment variable so they install packages on their own.

davidcavazos commented 3 years ago

Sorry for the delay. From my understanding, you just need to set up the environment as you would do if you were launching the pipeline from your local machine. So you need to install the requirements so that the Python file runs. A VM starts using the image and then basically calls something like:

python $FLEX_TEMPLATE_PYTHON_PY_FILE \
  --runner DataflowRunner \
  --requirements_file $FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE \
  --setup_file $FLEX_TEMPLATE_PYTHON_SETUP_FILE \
  ...

So essentially they are the way to specify the workers requirements and what to install.

jskelcy commented 3 years ago

@sneko I think I am running into similar issues, and I am new to beam/python. Could you talk about what is happening in your setup.py file?

jamiekt commented 3 years ago

~Like @jskelcy I would appreciate someone posting their setup.py file so I can see what is going on in there. TIA~ I think I've found the relevant docs: https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#multiple-file-dependencies

jamiekt commented 3 years ago

@melodyshen @sneko The 10-minute startup time is explained (and solved) by https://stackoverflow.com/a/65790324/201657 and the post that it links off to https://stackoverflow.com/questions/64813705/dataflow-with-python-flex-template-launcher-timeout.

ben5311 commented 3 years ago

@MelodyShen @sneko The 10-minute startup time is explained (and solved) by https://stackoverflow.com/a/65790324/201657 and the post that it links off to https://stackoverflow.com/questions/64813705/dataflow-with-python-flex-template-launcher-timeout.

Thank you!

coryroyce commented 2 years ago

@sneko I was having a similar issue because I was using poetry as a package manager and after a lot of trouble shooting the pyproject.toml seems to cause the Dataflow job to fail in the build stage and hang until the whole thing times out after about an hour.

I removed the pyproject.toml and poetry.lock file and then was able to run the dataflow job directly in about ~4 minutes. I was also able to package it as a flex template and run it that way which took about ~7 minutes withe the extra flex template steps.

As a reference, I found this article and linked code to be extremely helpful for the python batch flex template that was mentioned earlier.

davidcavazos commented 2 years ago

That's good to know, I haven't tried poetry yet, but I believe Dataflow is pretty hard-coded to use pip. Even conda is not well supported at all.

I guess you could create a custom container using poetry though. That's how I've gotten to use conda and other non-pip stuff.