Open Shreyanand opened 3 years ago
IIUC the goal here is to read logs and artifacts (for example, the ones over here) directly from GCS, right? I'm able to download the build logs using the following snippet without any errors:
from google.cloud import storage
def download_public_file(bucket_name, source_blob_name, destination_file_name):
"""Downloads a public blob from the bucket."""
# bucket_name = "your-bucket-name"
# source_blob_name = "storage-object-name"
# destination_file_name = "local/path/to/file"
storage_client = storage.Client.create_anonymous_client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(source_blob_name)
blob.download_to_filename(destination_file_name)
print(
"Downloaded public blob {} from bucket {} to {}.".format(
source_blob_name, bucket.name, destination_file_name
)
)
download_public_file(
'origin-ci-test',
'logs/periodic-ci-openshift-release-master-ci-4.7-upgrade-from-stable-4.6-e2e-azure-upgrade/1377477249040650240/build-log.txt',
'buildlogs.txt',
)
The downloaded file:
Would this solve the issue you guys are running into? Or have I misunderstood the problem at hand?
@Shreyanand, I looked into these examples and used the create_anonymous_client()
seems to work here for accessing info about the bucket.
storage_client = storage.Client.create_anonymous_client()
@MichaelClifford This looks good! Should we update this in the recent PR? Also @chauhankaranraj, can we get the testgrid pass/fail data too from this process?
Is your feature request related to a problem? Please describe. As of now, we collect data by scraping https://testgrid.k8s.io/ for testgrid data and https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/{tab name}/ for extracting logs. A better alternative would to get this data directly from google cloud storage through it's API. Since all of this is public data, reading it should be possible without any credentials but it seems that it is not the case.
@MichaelClifford and I tried two things:
1) Something similar to this with
api_endpoint="https://storage.googleapis.com", project='openshift-gce-devel-ci'
but we got this error -> "ValueError: Anonymous credentials cannot be refreshed."2) We looked at how tesgrid collects the data on its backend (relevant files: 1, 2) but it seems even they have some authorization credentials.
Collecting from google storage seems a better option as it removes the dependency of the website that is being scraped. The credentials part needs to be looked into more.