amosproj / amos2024ss08-cloud-native-llm

MIT License
7 stars 1 forks source link

Extract the dataset from Hugging Face #16

Closed dominic0df closed 4 months ago

dominic0df commented 4 months ago

User story

  1. As a Machine Learning Engineer
  2. I want / need to extract the data from my dataset in Hugging Face
  3. So that I will be able to perform Machine Learning with the data

Acceptance criteria

Definition of done (DoD)

DoD general criteria

dominic0df commented 4 months ago

Can you please give us a status in this issue?

And please state out the size that you have invested so far and change the label according to the work size that still needs to be done /cc @YashodharPansuriya

YashodharPansuriya commented 4 months ago

I wrote the script for reading json data of markdown and pdf files from hugging face. But I try to read the json data of yaml file but I am getting some error. So I try to resolve that issue.

On Wed, 15 May, 2024, 1:03 pm Dominic Fischer, @.***> wrote:

Can you please give us a status in this issue?

And please state out the size that you have invested so far and change the label according to the work size that still needs to be done /cc @YashodharPansuriya https://github.com/YashodharPansuriya

— Reply to this email directly, view it on GitHub https://github.com/amosproj/amos2024ss08-cloud-native-llm/issues/16#issuecomment-2112216347, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2A4X3SQIOQRILFFEAYPYYLZCM6I3AVCNFSM6AAAAABHHZZSZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJSGIYTMMZUG4 . You are receiving this because you were mentioned.Message ID: @.***>

YashodharPansuriya commented 4 months ago

I am facing an issue with this task. I wrote a Python script to read data from markdown and PDF JSON files. However, Hugging Face is unable to read YAML file JSON data due to inconsistencies in data types. We have many YAML JSON files, which makes it difficult for me to solve this issue. I have tried to resolve it but have not been successful. Can you please suggest what I can do about this issue? How can I define the actual size of this task?