Closed naanrdk closed 2 years ago
@shashigharti can you please take a look?
@naanrdk I am not able to access the link Full S3 bug report. Could you please share the screenshot.
@shashigharti, attached Dataset link: https://dev-data.s3.wasabisys.com/apy/processed/apy.csv?AWSAccessKeyId=HB3LLH7JRZCFQI2OCM2Y&Expires=1658910131&Signature=PMZ5MlIxnMlPkOOWZ7H1mg2bVD8%3D
Hi naanrdk,
I can't access the link you are sharing "https://dev-data.s3.wasabisys.com/apy/processed/apy.csv?AWSAccessKeyId=HB3LLH7JRZCFQI2OCM2Y&Expires=1658910131&Signature=PMZ5MlIxnMlPkOOWZ7H1mg2bVD8%3D"
Looks like you are trying to use repository with private data from s3 and getting this error?.
@shashigharti, apologies, as it as private data from s3 the pre signed url was only valid for certain amount of time, I've regenerated for your reference: https://dev-data.s3.wasabisys.com/apy/processed/apy.csv?AWSAccessKeyId=HB3LLH7JRZCFQI2OCM2Y&Expires=1658939165&Signature=4lBqrXSgxmtOVxzTwGYQU%2BxXimc%3D
@naanrdk I tried running the action using file from s3 bucket(by making it public). It works fine yaml: https://github.com/shashigharti/currency-codes/blob/master/.github/workflows/frictionless.yaml report: https://repository.frictionlessdata.io/report/?user=shashigharti&repo=currency-codes&flow=frictionless&run=2752915254 path: https://github.com/shashigharti/currency-codes/blob/master/datapackage.json
@roll If the files are private, the s3 bucket has to be made available in the docker container. If not this error occurs: and we might also need to update this file to include 's3', in that case. https://github.com/frictionlessdata/repository/blob/main/requirements.txt
@shashigharti, I see that you have forked rolls repo I don't find the lib folder in it how did you run the action also you have used codes-all.csv as input the data from the repository itself can the input path be set dynamically as an argument instead of hard coding it?
All the details of running the validations are here: https://www.youtube.com/watch?v=kXA4hmuF57c
And I haven't forked repository folder. It is a different repo that I forked and used the repository as github actions. The data that I am using is from s3 bucket(that I uploaded). . I have removed data folder as well from the repo. https://github.com/shashigharti/currency-codes
@shashigharti, I was under impression that the s3 has to be specified in the https://github.com/frictionlessdata/repository/pull/14#discussion_r928515825 (frictionless.yaml) and according to the YouTube but you have used the s3 path in the https://github.com/shashigharti/currency-codes/blob/master/datapackage.json I got confused.
I was under impression that the s3 has to be specified in the #14 (comment) (frictionless.yaml) and according to the YouTube but you have used the s3 path in the https://github.com/shashigharti/currency-codes/blob/master/datapackage.json I got confused.
@naanrdk you can use it in many different ways, example of using a different configuration instead of 'datapackage.json', you can find it here: https://youtu.be/kXA4hmuF57c?t=584
And this documentation explains it as well: https://repository.frictionlessdata.io/docs/configuration
I have also made changes to the following repo adding frictionless.yaml file, it then uses frictionless.yaml instead of 'datapackage.json' which is detected automatically https://github.com/shashigharti/currency-codes
Thanks @shashigharti with regards to using a private s3 is not functional yet?
@naanrdk yes it doesn't support private buckets for security reasons. One way would be to add github action to
@naanrdk I agree with @shashigharti that you might achieve it using Frictionless Framework (install python/install frictionless/validate) using it here directly - https://github.com/shashigharti/currency-codes/blob/ef6ec595b58ef1b634725f5001d059791beb2739/.github/workflows/general.yaml#L22 - and providing AWS secrets as github secrets / env vars
Another question is that it's highly unrecommented to do so as it's a security risk. I would suggest you just make your bucket public as if you anyway expose it to Github Action it's de-facto public
@roll I want to use this on a private bucket but for testing purpose I was rudimentary giving the presigned URL to test. Understood shall try the framework way!
Unable to validate the data from S3 with framework:
Hi @naanrdk, you don't need to read the data manually before validate. Pelase take a look - https://framework.frictionlessdata.io/docs/tutorials/schemes/s3-tutorial
Overview
In continuation conversation with @roll, replacing path Wanting to give custom input for validating
Full S3 bug report