Extract the dataset from Hugging Face

amosproj / amos2024ss08-cloud-native-llm

MIT License

7 stars 1 forks source link

Extract the dataset from Hugging Face #16

Closed dominic0df closed 4 months ago

dominic0df commented 4 months ago

User story

As a Machine Learning Engineer
I want / need to extract the data from my dataset in Hugging Face
So that I will be able to perform Machine Learning with the data

Acceptance criteria

We are able to read the data in our dataset from Hugging Face, so that we will be able to process it afterwards in a Machine Learning Pipeline.

Definition of done (DoD)

Added only after week 5
The same for all features
Here goes the project specific part

DoD general criteria

Feature has been fully implemented
Feature has been merged into the mainline
All acceptance criteria were met
Product owner approved features
All tests are passing
Developers agreed to release

dominic0df commented 4 months ago

Can you please give us a status in this issue?

And please state out the size that you have invested so far and change the label according to the work size that still needs to be done /cc @YashodharPansuriya

YashodharPansuriya commented 4 months ago

I wrote the script for reading json data of markdown and pdf files from hugging face. But I try to read the json data of yaml file but I am getting some error. So I try to resolve that issue.

On Wed, 15 May, 2024, 1:03 pm Dominic Fischer, @.***> wrote:

Can you please give us a status in this issue?

And please state out the size that you have invested so far and change the label according to the work size that still needs to be done /cc @YashodharPansuriya https://github.com/YashodharPansuriya

— Reply to this email directly, view it on GitHub https://github.com/amosproj/amos2024ss08-cloud-native-llm/issues/16#issuecomment-2112216347, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2A4X3SQIOQRILFFEAYPYYLZCM6I3AVCNFSM6AAAAABHHZZSZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJSGIYTMMZUG4 . You are receiving this because you were mentioned.Message ID: @.***>

YashodharPansuriya commented 4 months ago

I am facing an issue with this task. I wrote a Python script to read data from markdown and PDF JSON files. However, Hugging Face is unable to read YAML file JSON data due to inconsistencies in data types. We have many YAML JSON files, which makes it difficult for me to solve this issue. I have tried to resolve it but have not been successful. Can you please suggest what I can do about this issue? How can I define the actual size of this task?