Open nmetts opened 3 years ago
Sorry, there isn't enough information here for me to have an opinion - what call were you attempting, and what traceback did you get?
We weren't calling any functions from gcsfs. Just having gcsfs in the requirements.txt file caused the Cloud Function to fail to deploy. As soon as we specified gcsfs==0.6.2 in requirements.txt the Cloud Function was able to deploy. I added the MCVE to the issue. As you can see, the code snippet, which was simplified as much as possible, did not even import gcsfs, let alone call any functions from the package. Something in the Cloud Functions environment does not agree with gcsfs version 0.7.0.
You must have some sort of logging about what went wrong during environment preparation. For example, the following works fine for a new blank environment (I create with conda, but the install is with pip, just like yours):
$ conda create -n testme python=3.7.6 pip
$ conda activate testme
$ pip install gcsfs
$ python
>>> import gcsfs
>>> gcsfs.__version__
'0.7.0'
Deploying from the Cloud Functions GUI results in the following:
Error: function terminated. Recommended action: inspect logs for termination reason. Function cannot be initialized.
Deploying from the Terminal in my local machine results in the following:
$gcloud functions deploy test-fn --verbosity debug
DEBUG: Running [gcloud.functions.deploy] with arguments: [--verbosity: “debug”, NAME: “stevea-test-fn”]
Deploying function (may take a while - up to 2 minutes)...failed.
DEBUG: (gcloud.functions.deploy) OperationError: code=13, message=Function deployment failed due to a health check failure. This usually indicates that your code was built successfully but failed during a test execution. Examine the logs to determine the cause. Try deploying again in a few minutes if it appears to be transient.
Traceback (most recent call last):
File “/Users/test_user/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py”, line 848, in Execute
resources = calliope_command.Run(cli=self, args=args)
File “/Users/test_user/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py”, line 770, in Run
resources = command_instance.Run(args)
File “/Users/test_user/google-cloud-sdk/lib/surface/functions/deploy.py”, line 180, in Run
return _Run(args, track=self.ReleaseTrack())
File “/Users/test_user/google-cloud-sdk/lib/surface/functions/deploy.py”, line 150, in _Run
return api_util.PatchFunction(function, updated_fields)
File “/Users/test_user/google-cloud-sdk/lib/googlecloudsdk/api_lib/functions/util.py”, line 308, in CatchHTTPErrorRaiseHTTPExceptionFn
return func(*args, *kwargs)
File “/Users/test_user/google-cloud-sdk/lib/googlecloudsdk/api_lib/functions/util.py”, line 364, in PatchFunction
operations.Wait(op, messages, client, _DEPLOY_WAIT_NOTICE)
File “/Users/test_user/google-cloud-sdk/lib/googlecloudsdk/api_lib/functions/operations.py”, line 126, in Wait
_WaitForOperation(client, request, notice)
File “/Users/test_user/google-cloud-sdk/lib/googlecloudsdk/api_lib/functions/operations.py”, line 101, in _WaitForOperation
sleep_ms=SLEEP_MS)
File “/Users/test_user/google-cloud-sdk/lib/googlecloudsdk/core/util/retry.py”, line 219, in RetryOnResult
result = func(args, **kwargs)
File “/Users/test_user/google-cloud-sdk/lib/googlecloudsdk/api_lib/functions/operations.py”, line 65, in _GetOperationStatus
raise exceptions.FunctionsError(OperationErrorToString(op.error))
FunctionsError: OperationError: code=13, message=Function deployment failed due to a health check failure. This usually indicates that your code was built successfully but failed during a test execution. Examine the logs to determine the cause. Try deploying again in a few minutes if it appears to be transient.
ERROR: (gcloud.functions.deploy) OperationError: code=13, message=Function deployment failed due to a health check failure. This usually indicates that your code was built successfully but failed during a test execution. Examine the logs to determine the cause. Try deploying again in a few minutes if it appears to be transient. (edited)`
This is the traceback from your CLI. Can you get the traceback/log from the actual deployment?
It may be obvious by now, but I have no idea how "Cloud Functions" work.
Essentially Google Cloud Functions creates a container for code and deploys it on Google Cloud Platform to be invoked by some event (HTTP Call, message to Pub/Sub topic, etc.). You can read more about it here: https://cloud.google.com/functions/docs/writing The traceback above is from attempting to deploy a Cloud Function not locally, but to GCP. The difference is in the example above I am simply executing the deploy command from my CLI. Logging for Google's Cloud Functions is abysmal and generally appears to exclude tracebacks or really any useful information at all, which is why I used the CLI example because it actually gives a debug option. This is all that is shown in the logs for the Cloud Function deployment on the GCP GUI:
Error: function terminated. Recommended action: inspect logs for termination reason. Function cannot be initialized.
Hm, I'm a bit stuck, because the slightly more detailed logs from the CLI also says "Examine the logs to determine the cause." ! It suggests that there is more detail somewhere.
I'll see what I can find. If there are additional details somewhere it it not obvious where to find them. I'll open an issue with GCP as well.
@martindurant just as a fyi, we faced an issue with 0.7.0 as well, not in GCF but in local (and k8s) container. Symptom: a Airflow DAG using gcsfs timed out. A colleague mentioned a hash problem in gcsfs lib but not sure, will troubleshoot later. Till then we reverted back to 0.6.2. Since we use gcsfs in GCF too, we will version the dependency to be safe.
Do you have more details on the 0.7.0 changes (or in fsspec dependency) ? aiohttp cannot be the only change.
0.7.0 was a big rewrite to allow async - and many code paths changed to make this happen. In this case, the changes are too many to list, really. I am not sure what "hash" problem there could be, though, and any help you can give in pinpointing it would be appreciated.
Here what happens: 0.7.x has a dependency on aiohttp 3.7.3
which requires chardet <4.0
Google has no documentation on the libs included with the Python 3.8 runtime image but it must have chardet 4.0.0 as that's the error I get when deploying.
So solution is either to set in requirements.txt
a lower version e.g. chardet==3.0.4
or, what I have chosen, skip the use of chardet
as per https://docs.aiohttp.org/en/stable/ by adding in requirements, aiohttp[speedups]
@nmetts after that you should be able to deploy @martindurant may want to add that in gcfs dependencies instead
Update: although function can be deployed, 0.7.x
still "hangs" on any method of GCSFileSystem()
It seems the issue exists when running on any GCP resource as we experienced the same on GKE.
To reproduce, create a gcs background function through console and use the following code:
import gcsfs
gs_filesys = gcsfs.GCSFileSystem()
def hello_gcs(event, context):
file = event
print(f"Processing file: {file['name']}.")
print( event, context)
file_name = event['name']
input_bucket = event['bucket']
file_path = f'{input_bucket}/{file_name}'
if gs_filesys.exists(file_path):
print('Processing ' + file_path)
The exists()
never returns and function ends up timing out.
Downgrading to 0.6.2
works
Could you please try with gcsfs.GCSFileSystem(token="cloud")
?
create a gcs background function
What does this mean?
Can you please configure the logger "gcsfs" or set the environment variable GCSFS_DEBUG , and you should (hopefully) get more information about what gcsfs is trying to do.
Yep - I had tried all that:
Had found the debug env switch in the source code but it only produced: gcsfs - DEBUG - Connected with method google_default
token='cloud'
did not change (but interestingly did not produce the above log msg)
I wish I could debug exists()
but locally using Win10, Ubuntu on WSL or a 3.8 docker image, all work fine.
So something on GCP is holding up. Unfortunately there is no timeout error so don't know what gcsfs is trying. The only timeout is from GCF when function times out and the instance gets killed.
Are there any other flags to produce more logs ?
As to how to deploy a GCF function: https://cloud.google.com/functions/docs/deploying/console
Do you explicitly supply a google credentials file for the function?
Never. gcsfs.GCSFileSystem()
just works every time on < 0.7.0, either locally or on GCP.
It works locally because env GOOGLE_APPLICATION_CREDENTIALS
points to the json file. Not sure on GCP.
Note that 0.7.x on GCP does not error out on GCSFileSystem()
instantiation - it never does but any subsequent method runs forever.
We have the same issue in GCP Cloud Composer DAG - call to gcsfs
to list GCS bucket just hangs forever. Although interesting thing here is that issue happens in Composer version >composer-1.13.0-airflow-1.10.12
. Same code on composer-1.13.0-airflow-1.10.12
runs without issues, so I guess that has something to do with Google (although they list no visible changes in Composer changelog).
Here's an issue I've reported in GCP bugtracker, will update it soon with the latest findings.
@martindurant would it make sense to fork the issue / create a new one as deploying on GCF is different from running on GCP (any component).
Sure, feel free
Puzzling:
Cannot be deployed w/o versioning chardet
or e.g. installing aiohttp[speedups]
:
import gcsfs
gcs = gcsfs.GCSFileSystem()
def hello_gcs(event, context):
print(gcsfs._version.get_versions())
file_name = event['name']
input_bucket = event['bucket']
file_path = f'{input_bucket}/{file_name}'
if gcs.exists(file_path):
print('Confirmed exists: ' + file_path)
Error: Build failed: found incompatible dependencies: "aiohttp 3.7.3 has requirement chardet<4.0,>=2.0, but you have chardet 4.0.0."; Error ID: 5bbe2179
Can be deployed as-is - just installing gcsfs
:
import gcsfs
gcs = gcsfs.GCSFileSystem()
def hello_gcs(event, context):
gcs.maybe_refresh()
gcs._connect_cloud()
print(gcsfs._version.get_versions())
file_name = event['name']
input_bucket = event['bucket']
file_path = f'{input_bucket}/{file_name}'
if gcs.exists(file_path):
print('Confirmed exists: ' + file_path)
Well, ignore previous comment. I managed to deploy as-is by just retrying. Issue might have been intermittent on GCP-side.
To be safe though, I would add cchardet
to requirements.txt
Would you mind making that a PR? So chardet should be v4 or any version, and does its order versus aiohttp matter?
I have been testing a dozen variations of a function and can confirm that the issue is with GCP, when first deployed. More specifically,
import gcsfs
gcs = gcsfs.GCSFileSystem()
def hello_world(request):
return 'test'
As such, I have logged a support ticket with GCP as there is nothing wrong with gcsfs
. It looks like GCF installs chardet 4.0
after requirements.txt
gets first processed
Note: OP may have experienced another issue as when first posted aiohttp 3.7.3
was not out yet - if relevant in any way. The point is that today I cannot reproduce a deployment issue when selecting the 3.7 runtime
Thanks @yiga2 . Is there a public link to the issue? If not, please report back when you hear anything.
Aside from the build issue, did you try instantiating GCSFileSystem within your function?
Private issue to get more traction - will report back indeed and request G to create a public issue if cannot be remediated quickly.
Responding in the other thread for the second question as unrelated to GCF deploying.
What happened: We are using Cloud Functions in Google Cloud Platform with a Python 3.7 runtime, and we use gcsfs to access files stored in Google Storage buckets. After adding some new features to an existing, working function, an attempt to redeploy the function failed with the following error message in the Logs Viewer: Error: function terminated. Recommended action: inspect logs for termination reason. Function cannot be initialized.
The Cloud Function was then unable to be tested since deployment failed.
We simplified our code until we were able to determine that the only issue was gcsfs in the requirements.txt file. After checking the release history and seeing that 0.7.0 was released on August 21, 2020, which coincided with the deploy failure of our Cloud Function, we decided to specify version 0.6.2 on the requirements.txt file. With this change made, the Cloud Function successfully deployed.
What you expected to happen: We expected our Cloud Function to continue working as it had been before.
Minimal Complete Verifiable Example:
The issue appears to be with attempting to install gcsfs 0.7.0 on pip, so no code is required to trigger this bug.
Anything else we need to know?:
Environment: Google Cloud Platform Cloud Functions