frictionlessdata / frictionless-ci

Data management service that brings continuous data validation to tabular data in your repository via Github Action
https://repository.frictionlessdata.io
MIT License
37 stars 12 forks source link

Custom input #34

Closed naanrdk closed 2 years ago

naanrdk commented 2 years ago

Overview

In continuation conversation with @roll, replacing path Wanting to give custom input for validating

Full S3 bug report

roll commented 2 years ago

@shashigharti can you please take a look?

shashigharti commented 2 years ago

@naanrdk I am not able to access the link Full S3 bug report. Could you please share the screenshot.

naanrdk commented 2 years ago

@shashigharti, attached image Dataset link: https://dev-data.s3.wasabisys.com/apy/processed/apy.csv?AWSAccessKeyId=HB3LLH7JRZCFQI2OCM2Y&Expires=1658910131&Signature=PMZ5MlIxnMlPkOOWZ7H1mg2bVD8%3D

shashigharti commented 2 years ago

Hi naanrdk,

I can't access the link you are sharing "https://dev-data.s3.wasabisys.com/apy/processed/apy.csv?AWSAccessKeyId=HB3LLH7JRZCFQI2OCM2Y&Expires=1658910131&Signature=PMZ5MlIxnMlPkOOWZ7H1mg2bVD8%3D"

Looks like you are trying to use repository with private data from s3 and getting this error?.

naanrdk commented 2 years ago

@shashigharti, apologies, as it as private data from s3 the pre signed url was only valid for certain amount of time, I've regenerated for your reference: https://dev-data.s3.wasabisys.com/apy/processed/apy.csv?AWSAccessKeyId=HB3LLH7JRZCFQI2OCM2Y&Expires=1658939165&Signature=4lBqrXSgxmtOVxzTwGYQU%2BxXimc%3D

shashigharti commented 2 years ago

@naanrdk I tried running the action using file from s3 bucket(by making it public). It works fine yaml: https://github.com/shashigharti/currency-codes/blob/master/.github/workflows/frictionless.yaml report: https://repository.frictionlessdata.io/report/?user=shashigharti&repo=currency-codes&flow=frictionless&run=2752915254 path: https://github.com/shashigharti/currency-codes/blob/master/datapackage.json

@roll If the files are private, the s3 bucket has to be made available in the docker container. If not this error occurs: error and we might also need to update this file to include 's3', in that case. https://github.com/frictionlessdata/repository/blob/main/requirements.txt

naanrdk commented 2 years ago

@shashigharti, I see that you have forked rolls repo I don't find the lib folder in it how did you run the action also you have used codes-all.csv as input the data from the repository itself can the input path be set dynamically as an argument instead of hard coding it?

shashigharti commented 2 years ago

All the details of running the validations are here: https://www.youtube.com/watch?v=kXA4hmuF57c

shashigharti commented 2 years ago

And I haven't forked repository folder. It is a different repo that I forked and used the repository as github actions. The data that I am using is from s3 bucket(that I uploaded). . I have removed data folder as well from the repo. https://github.com/shashigharti/currency-codes

naanrdk commented 2 years ago

@shashigharti, I was under impression that the s3 has to be specified in the https://github.com/frictionlessdata/repository/pull/14#discussion_r928515825 (frictionless.yaml) and according to the YouTube but you have used the s3 path in the https://github.com/shashigharti/currency-codes/blob/master/datapackage.json I got confused.

shashigharti commented 2 years ago

I was under impression that the s3 has to be specified in the #14 (comment) (frictionless.yaml) and according to the YouTube but you have used the s3 path in the https://github.com/shashigharti/currency-codes/blob/master/datapackage.json I got confused.

@naanrdk you can use it in many different ways, example of using a different configuration instead of 'datapackage.json', you can find it here: https://youtu.be/kXA4hmuF57c?t=584

And this documentation explains it as well: https://repository.frictionlessdata.io/docs/configuration

I have also made changes to the following repo adding frictionless.yaml file, it then uses frictionless.yaml instead of 'datapackage.json' which is detected automatically https://github.com/shashigharti/currency-codes

naanrdk commented 2 years ago

Thanks @shashigharti with regards to using a private s3 is not functional yet?

shashigharti commented 2 years ago

@naanrdk yes it doesn't support private buckets for security reasons. One way would be to add github action to

roll commented 2 years ago

@naanrdk I agree with @shashigharti that you might achieve it using Frictionless Framework (install python/install frictionless/validate) using it here directly - https://github.com/shashigharti/currency-codes/blob/ef6ec595b58ef1b634725f5001d059791beb2739/.github/workflows/general.yaml#L22 - and providing AWS secrets as github secrets / env vars

Another question is that it's highly unrecommented to do so as it's a security risk. I would suggest you just make your bucket public as if you anyway expose it to Github Action it's de-facto public

naanrdk commented 2 years ago

@roll I want to use this on a private bucket but for testing purpose I was rudimentary giving the presigned URL to test. Understood shall try the framework way!

naanrdk commented 2 years ago

Unable to validate the data from S3 with framework: image

roll commented 2 years ago

Hi @naanrdk, you don't need to read the data manually before validate. Pelase take a look - https://framework.frictionlessdata.io/docs/tutorials/schemes/s3-tutorial